Hi Thomas,

Hi Thomas,

besides that information of Sebastian, you dataset seems to be quite imbalances 
(48 positive and 1230 negative observations).
You could try rebalancing your data using
https://github.com/scikit-learn-contrib/imbalanced-learn

This package offers some methods for resampling your data (under-sampling the 
majority class, over-sampling the minority class, etc.)


Greets,
Piotr

On 08.12.2016 01:19, Sebastian Raschka wrote:
Hi, Thomas,
we had a related thread on the email list some time ago, let me post it for 
reference further below. Regarding your question, I think you may want make 
sure that you standardized the features (which makes the learning generally it 
less sensitive to learning rate and random weight initialization). However, 
even then, I would try at least 1-3 different random seeds and look at the cost 
vs time — what can happen is that you land in different minima depending on the 
weight initialization as demonstrated in the example below (in MLPs you have 
the problem of a complex cost surface).

Best,
Sebastian

The default is set 100 units in the hidden layer, but theoretically, it should 
work with 2 hidden logistic units (I think that’s the typical textbook/class 
example). I think what happens is that it gets stuck in local minima depending 
on the random weight initialization. E.g., the following works just fine:

from sklearn.neural_network import MLPClassifier
X = [[0, 0], [0, 1], [1, 0], [1, 1]]
y = [0, 1, 1, 0]
clf = MLPClassifier(solver='lbfgs',
                    activation='logistic',
                    alpha=0.0,
                    hidden_layer_sizes=(2,),
                    learning_rate_init=0.1,
                    max_iter=1000,
                    random_state=20)
clf.fit(X, y)
res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
print(res)
print(clf.loss_)


but changing the random seed to 1 leads to:

[0 1 1 1]
0.34660921283

For comparison, I used a more vanilla MLP (1 hidden layer with 2 units and 
logistic activation as well; 
<https://github.com/rasbt/python-machine-learning-> 
https://github.com/rasbt/python-machine-learning-book/blob/master/code/ch12/ch12.ipynb),
 essentially resulting in the same problem:
[cid:[email protected]][cid:[email protected]]















On Dec 7, 2016, at 6:45 PM, Thomas Evangelidis 
<[email protected]<mailto:[email protected]>> wrote:

I tried the sklearn.neural_network.MLPClassifier with the default parameters 
using the input data I quoted in my previous post about Nu-Support Vector 
Classifier. The predictions are great but the problem is that sometimes when I 
rerun the MLPClassifier it predicts no positive observations (class 1). I have 
noticed that this can be controlled by the random_state parameter, e.g. 
MLPClassifier(random_state=0) gives always no positive predictions. My question 
is how can I chose the right random_state value in a real blind test case?

thanks in advance
Thomas


--
======================================================================
Thomas Evangelidis
Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: [email protected]<mailto:[email protected]>
          [email protected]<mailto:[email protected]>

website: https://sites.google.com/site/thomasevangelidishomepage/


_______________________________________________
scikit-learn mailing list
[email protected]<mailto:[email protected]>
https://mail.python.org/mailman/listinfo/scikit-learn




_______________________________________________
scikit-learn mailing list
[email protected]<mailto:[email protected]>
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to