I've read about Dask and it is a tool I want to have in my belt especially
for using the SGE connection in order to run GridSearchCV on the
supercomputer center I have access to. Should it work as promised it will
be one of my favs.

As far as my toy example I keep more limited goals with this graph: I am
not currently interested in parallelizing each step as I guess that
parallelizing each graph fit through gridSearchCV will be more similar to
what I need.

I keep working on a proof concept. You can have a look at:

https://github.com/mcasl/PAELLA/blob/master/pipeGraph.py

along with a few unitary tests:
https://github.com/mcasl/PAELLA/blob/master/tests/test_pipeGraph.py

As of today, I have an iterable graph of steps that can be fitted/run
depending on their role (some can be disable during run while active during
fit or vice-versa). I still have to play a bit with injecting different
parameters to make it compatible with gridSearchCV and learn a bit about
the memory options in order to cache results.

Any comments highly appreciated, truly!
Manolo




2017-12-30 15:34 GMT+01:00 Frédéric Bastien <frederic.bast...@gmail.com>:

> This start to look as the dask project. Do you know it?
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to