Re: [scikit-learn] combining arrays of features to train an MLP

Sebastian Raschka Mon, 19 Dec 2016 09:19:11 -0800

Thanks, Thomas, that makes sense! Will submit a PR then to update the docstring.


Best,
Sebastian


> On Dec 19, 2016, at 11:06 AM, Thomas Evangelidis <[email protected]> wrote:
> 
> 
> Greetings,
> 
> My dataset consists of objects which are characterised by their structural 
> features which are encoded into a so called "fingerprint" form. There are 
> several different types of fingerprints, each one encapsulating different 
> type of information. I want to combine two specific types of fingerprints to 
> train a MLP regressor. The first fingerprint consists of a 2048 bit array of 
> the form:
> 
>  FP1 = array([ 1.,  1.,  0., ...,  0.,  0.,  1.], dtype=float32)
> 
> The second is a 60 float number array of the form:
> 
> FP2 = array([ 2.77494618,  0.98973243,  0.34638652,  2.88303715,  1.31473857,
>        -0.56627112,  4.78847547,  2.29587913, -0.6786228 ,  4.63391109,
>        ...
>         0.        ,  0.        ,  5.89652792,  0.        ,  0.        ])
> 
> At first I tried to fuse them into a single 1D array of 2048+60 columns but 
> the predictions of the MLP were worse than the 2 different MLP models trained 
> from one of the 2 fingerprint types individually. My question: is there a 
> more effective way to combine the 2 fingerprints in order to indicate that 
> they represent different type of information?
>  
> To this end, I tried to create a 2-row array (1st row 2048 elements and 2nd 
> row 60 elements) but sklearn complained:
> 
>     mlp.fit(x_train,y_train)
>   File 
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
>  line 618, in fit
>     return self._fit(X, y, incremental=False)
>   File 
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
>  line 330, in _fit
>     X, y = self._validate_input(X, y, incremental)
>   File 
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
>  line 1264, in _validate_input
>     multi_output=True, y_numeric=True)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", 
> line 521, in check_X_y
>     ensure_min_features, warn_on_dtype, estimator)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", 
> line 402, in check_array
>     array = array.astype(np.float64)
> ValueError: setting an array element with a sequence.
> 
> 
> Then I tried to create for each object of the dataset a 2D array of size 
> 2x2048, by adding 1998 zeros in the second row in order both rows to be of 
> equal size. However sklearn complained again:
> 
> 
>     mlp.fit(x_train,y_train)
>   File 
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
>  line 618, in fit
>     return self._fit(X, y, incremental=False)
>   File 
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
>  line 330, in _fit
>     X, y = self._validate_input(X, y, incremental)
>   File 
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
>  line 1264, in _validate_input
>     multi_output=True, y_numeric=True)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", 
> line 521, in check_X_y
>     ensure_min_features, warn_on_dtype, estimator)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", 
> line 405, in check_array
>     % (array.ndim, estimator_name))
> ValueError: Found array with dim 3. Estimator expected <= 2.
> 
> 
> In another case of fingerprints, lets name them FP3 and FP4, I observed that 
> the MLP regressor created using FP3 yields better results when trained and 
> evaluated using logarithmically transformed experimental values (the values 
> in y_train and y_test 1D arrays), while the MLP regressor created using FP4 
> yielded better results using the original experimental values. So my second 
> question is: when combining both FP3 and FP4 into a single array is there any 
> way to designate to the MLP that the features that correspond to FP3 must 
> reproduce the logarithmic transform of the experimental values while the 
> features of FP4 the original untransformed experimental values?
> 
> 
> I would greatly appreciate any advice on any of my 2 queries.
> Thomas
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> ======================================================================
> Thomas Evangelidis
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081, 
> 62500 Brno, Czech Republic 
> 
> email: [email protected]
>               [email protected]
> 
> website: https://sites.google.com/site/thomasevangelidishomepage/
> 
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] combining arrays of features to train an MLP

Reply via email to