I've read about Dask and it is a tool I want to have in my belt especially for using the SGE connection in order to run GridSearchCV on the supercomputer center I have access to. Should it work as promised it will be one of my favs.
As far as my toy example I keep more limited goals with this graph: I am not currently interested in parallelizing each step as I guess that parallelizing each graph fit through gridSearchCV will be more similar to what I need. I keep working on a proof concept. You can have a look at: https://github.com/mcasl/PAELLA/blob/master/pipeGraph.py along with a few unitary tests: https://github.com/mcasl/PAELLA/blob/master/tests/test_pipeGraph.py As of today, I have an iterable graph of steps that can be fitted/run depending on their role (some can be disable during run while active during fit or vice-versa). I still have to play a bit with injecting different parameters to make it compatible with gridSearchCV and learn a bit about the memory options in order to cache results. Any comments highly appreciated, truly! Manolo 2017-12-30 15:34 GMT+01:00 Frédéric Bastien <frederic.bast...@gmail.com>: > This start to look as the dask project. Do you know it? >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn