Dear all, after some playing with the concept we have developed a module for implementing the functionality of Pipeline in more general contexts as first introduced in a former thread ( https://mail.python.org/ pipermail/scikit-learn/2018-January/002158.html )
In order to expand the possibilities of Pipeline for non linearly sequential workflows a graph like structure has been deployed while keeping as much as possible the already known syntax we all love and honor: X = pd.DataFrame(dict(X=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])) y = 2 * X sc = MinMaxScaler() lm = LinearRegression() steps = [('scaler', sc), ('linear_model', lm)] connections = {'scaler': dict(X='X'), 'linear_model': dict(X=('scaler', 'predict'), y='y')} pgraph = PipeGraph(steps=steps, connections=connections, use_for_fit='all', use_for_predict='all') As you can see the biggest difference for the final user is the dictionary describing the connections. Another major contribution for developers wanting to expand scikit learn is a collection of adapters for scikit learn models in order to provide them a common API irrespectively of whether they originally implemented predict, transform or fit_predict as an atomic operation without predict. These adapters accept as many positional or keyword parameters in their fit predict methods through *pargs and **kwargs. As general as PipeGraph is, it cannot work under the restrictions imposed by GridSearchCV on the input parameters, namely X and y since PipeGraph can accept as many input signals as needed. Thus, an adhoc GridSearchCv version is also needed and we will provide a basic initial version in a later version. We need to write the documentation and we will propose it as a contrib-project in a few days. Best wishes, Manuel Castejón-Limas
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn