That looks fine. in line 125, can you try assert(np.all(np.argmax(y_score, axis=-1) == y_pred))
That should go through. On 02/25/2015 05:38 PM, shalu jhanwar wrote:
Hi Andy, please find this version of the code in which I changed the refit issue. thanks! ShaluOn Wed, Feb 25, 2015 at 11:35 PM, shalu jhanwar <[email protected] <mailto:[email protected]>> wrote:Hi Andy, Please see the code. Hereby I am attaching following files: i) Code: RandomForest_IndependentDataset_prabability_values.py ii) Test dataset: test.txt iii) Training dataset: training_data.txt Please use this command to run the code: python RandomForest_IndependentDataset_prabability_values.py -d training_data.txt -D <output_dir name> -C "3,4,5,6,7,8,15,17" -c "1" -g test.txt When you will run the code, 2 output file will be generated in the output directory named as: output.txt In that file, you can look for those entries for discrepancy results: chr3_125709142_125709481 chr19_32769611_32770111 chr18_3593848_3594348 chr19_49466802_49467527 chr12_860254_860664 chr19_49465555_49466264 chr2_64836549_64836646 thanks! Shalu On Wed, Feb 25, 2015 at 11:13 PM, Andy <[email protected] <mailto:[email protected]>> wrote: please show the code. On 02/25/2015 04:51 PM, shalu jhanwar wrote:Hi guys! I removed refitting the data, but didn't set random_state explicitly. The same problem persist .Look at these few examples: Y_true Y_predict Class0_prob. Class1_prob. 1 0 0.28 0.72 0 0 0.32 0.68 0 0 0.41 0.59 1 0 0.41 0.59 1 0 0.48 0.52 1 1 0.57 0.42 Please let me know still am I missing something?? thanks! Shalu On Wed, Feb 25, 2015 at 9:53 PM, shalu jhanwar <[email protected] <mailto:[email protected]>> wrote: Hi guys! Ahh, ok, I check it and will confirm you. thanks! Shalu On Wed, Feb 25, 2015 at 9:32 PM, Andy <[email protected] <mailto:[email protected]>> wrote: You fit the data again before calling predict_proba. You did not fix the random seed, so the outcome of the fit will be different and you can't expect it to be consistent. Just remove the second call to fit. On 02/25/2015 06:35 AM, shalu jhanwar wrote:Hey Guys, I am using Random forest classifier to perform binary classification on my dataset. I wanted to have a confidence value of both the classes corresponding to each sample. For that purpose, I used "predict_proba" method to predict class probabilities for X samples. I saw 2-3 strange observations in my samples as below: S.No. Y_true *Y_predicted_forest* Class_0_prob Class_1_prob 1. 1 0 0.28 0.72 2. 0 1 0.56 0.44 Here, based on the probabilities of classes, the algorithm should provide true positives. But it gave wrong predictions in spite of the high probability value of each class. Can anyone please explain this strange observation when the predicted probability of class 0 is more than class 1, still the output is class 1 and visa-versa? For further details, I am providing a chunk of my code used: #For Random Forest clf = RandomForestClassifier(n_estimators=40) scores = clf.fit(X_train, y_train).score(X_test, y_test) y_pred = clf.predict(X_test) *#Get proba for each class:* y_score = clf.fit(X_train, y_train).predict_proba(X_test) #Get value of each class as: y_score[:,0] - #For 0 class y_score[:,1] - #For 1 class thanks! Shalu ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now.http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list [email protected] <mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list [email protected] <mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now.http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list [email protected] <mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list [email protected] <mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
