That looks fine.
in line 125, can you try
assert(np.all(np.argmax(y_score, axis=-1) == y_pred))
That should go through.
On 02/25/2015 05:38 PM, shalu jhanwar wrote:
Hi Andy,
please find this version of the code in which I changed the refit issue.
thanks!
Shalu
On Wed, Feb 25, 2015 at 11:35 PM, shalu jhanwar
<shalu.jhanwa...@gmail.com <mailto:shalu.jhanwa...@gmail.com>> wrote:
Hi Andy,
Please see the code. Hereby I am attaching following files:
i) Code: RandomForest_IndependentDataset_prabability_values.py
ii) Test dataset: test.txt
iii) Training dataset: training_data.txt
Please use this command to run the code:
python RandomForest_IndependentDataset_prabability_values.py -d
training_data.txt -D <output_dir name> -C "3,4,5,6,7,8,15,17" -c
"1" -g test.txt
When you will run the code, 2 output file will be generated in the
output directory named as: output.txt
In that file, you can look for those entries for discrepancy results:
chr3_125709142_125709481
chr19_32769611_32770111
chr18_3593848_3594348
chr19_49466802_49467527
chr12_860254_860664
chr19_49465555_49466264
chr2_64836549_64836646
thanks!
Shalu
On Wed, Feb 25, 2015 at 11:13 PM, Andy <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
please show the code.
On 02/25/2015 04:51 PM, shalu jhanwar wrote:
Hi guys!
I removed refitting the data, but didn't set random_state
explicitly. The same problem persist .Look at these few examples:
Y_true Y_predict Class0_prob. Class1_prob.
1 0 0.28 0.72
0 0 0.32 0.68
0 0 0.41 0.59
1 0 0.41 0.59
1 0 0.48 0.52
1 1 0.57 0.42
Please let me know still am I missing something??
thanks!
Shalu
On Wed, Feb 25, 2015 at 9:53 PM, shalu jhanwar
<shalu.jhanwa...@gmail.com
<mailto:shalu.jhanwa...@gmail.com>> wrote:
Hi guys!
Ahh, ok, I check it and will confirm you.
thanks!
Shalu
On Wed, Feb 25, 2015 at 9:32 PM, Andy <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
You fit the data again before calling predict_proba.
You did not fix the random seed, so the outcome of
the fit will be different and you can't expect it to
be consistent.
Just remove the second call to fit.
On 02/25/2015 06:35 AM, shalu jhanwar wrote:
Hey Guys,
I am using Random forest classifier to perform
binary classification on my dataset. I wanted to
have a confidence value of both the classes
corresponding to each sample. For that purpose, I
used "predict_proba" method to predict class
probabilities for X samples.
I saw 2-3 strange observations in my samples as below:
S.No. Y_true *Y_predicted_forest* Class_0_prob
Class_1_prob
1. 1 0 0.28 0.72
2. 0 1 0.56 0.44
Here, based on the probabilities of classes, the
algorithm should provide true positives. But it gave
wrong predictions in spite of the high probability
value of each class.
Can anyone please explain this strange observation
when the predicted probability of class 0 is more
than class 1, still the output is class 1 and
visa-versa?
For further details, I am providing a chunk of my
code used:
#For Random Forest
clf = RandomForestClassifier(n_estimators=40)
scores = clf.fit(X_train, y_train).score(X_test, y_test)
y_pred = clf.predict(X_test)
*#Get proba for each class:*
y_score = clf.fit(X_train,
y_train).predict_proba(X_test)
#Get value of each class as:
y_score[:,0] - #For 0 class
y_score[:,1] - #For 1 class
thanks!
Shalu
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is
your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and
join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go
Parallel Website, sponsored
by Intel and developed in partnership with Slashdot
Media, is your hub for all
things parallel software development, from weekly
thought leadership blogs to
news, videos, case studies, tutorials and more. Take
a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is
your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look
and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general