That looks fine.

in line 125, can you try
assert(np.all(np.argmax(y_score, axis=-1) == y_pred))

That should go through.


On 02/25/2015 05:38 PM, shalu jhanwar wrote:
Hi Andy,

please find this version of the code in which I changed the refit issue.

thanks!
Shalu

On Wed, Feb 25, 2015 at 11:35 PM, shalu jhanwar <shalu.jhanwa...@gmail.com <mailto:shalu.jhanwa...@gmail.com>> wrote:

    Hi Andy,

    Please see the code. Hereby I am attaching following files:
    i) Code: RandomForest_IndependentDataset_prabability_values.py
    ii) Test dataset: test.txt
    iii) Training dataset: training_data.txt

    Please use this command to run the code:
    python RandomForest_IndependentDataset_prabability_values.py -d
     training_data.txt -D <output_dir name> -C "3,4,5,6,7,8,15,17" -c
    "1" -g test.txt

    When you will run the code, 2 output file will be generated in the
    output directory named as: output.txt

    In that file, you can look for those entries for discrepancy results:
    chr3_125709142_125709481
    chr19_32769611_32770111
    chr18_3593848_3594348
    chr19_49466802_49467527
    chr12_860254_860664
    chr19_49465555_49466264
    chr2_64836549_64836646


    thanks!
    Shalu


    On Wed, Feb 25, 2015 at 11:13 PM, Andy <t3k...@gmail.com
    <mailto:t3k...@gmail.com>> wrote:

        please show the code.



        On 02/25/2015 04:51 PM, shalu jhanwar wrote:
        Hi guys!

        I removed refitting the data, but didn't set random_state
        explicitly. The same problem persist .Look at these few examples:

        Y_true       Y_predict  Class0_prob.     Class1_prob.
           1                  0         0.28                  0.72
           0                  0         0.32                  0.68
           0                  0         0.41                  0.59
           1                  0         0.41                  0.59
           1                  0         0.48                  0.52
           1                  1         0.57                  0.42

        Please let me know still  am I missing something??
        thanks!
        Shalu



        On Wed, Feb 25, 2015 at 9:53 PM, shalu jhanwar
        <shalu.jhanwa...@gmail.com
        <mailto:shalu.jhanwa...@gmail.com>> wrote:

            Hi guys!

            Ahh, ok,  I check it and will confirm you.

            thanks!
            Shalu

            On Wed, Feb 25, 2015 at 9:32 PM, Andy <t3k...@gmail.com
            <mailto:t3k...@gmail.com>> wrote:

                You fit the data again before calling predict_proba.
                You did not fix the random seed, so the outcome of
                the fit will be different and you can't expect it to
                be consistent.
                Just remove the second call to fit.



                On 02/25/2015 06:35 AM, shalu jhanwar wrote:
                Hey Guys,

                I am using Random forest classifier to perform
                binary classification on my dataset. I wanted to
                have a confidence value of both the classes
                corresponding to each sample. For that purpose, I
                used "predict_proba" method to predict class
                probabilities for X samples.
                I saw 2-3 strange observations in my samples as below:

                S.No. Y_true *Y_predicted_forest*   Class_0_prob
                 Class_1_prob
                 1.  1 0          0.28    0.72
                 2.  0 1          0.56    0.44

                Here, based on the probabilities of classes, the
                algorithm should provide true positives. But it gave
                wrong predictions in spite of the high probability
                value of each class.

                Can anyone please explain this strange observation
                when the predicted probability of  class 0 is more
                than class 1, still the output is class 1 and
                visa-versa?

                For further details, I am providing a chunk of my
                code used:
                #For Random Forest
                clf = RandomForestClassifier(n_estimators=40)
                scores = clf.fit(X_train, y_train).score(X_test, y_test)
                y_pred = clf.predict(X_test)
                *#Get proba for each class:*
                y_score = clf.fit(X_train,
                y_train).predict_proba(X_test)
                   #Get value of each class as:
                 y_score[:,0] - #For 0 class
                 y_score[:,1]  -  #For 1 class

                thanks!
                Shalu


                
------------------------------------------------------------------------------
                Dive into the World of Parallel Programming The Go Parallel 
Website, sponsored
                by Intel and developed in partnership with Slashdot Media, is 
your hub for all
                things parallel software development, from weekly thought 
leadership blogs to
                news, videos, case studies, tutorials and more. Take a look and 
join the
                conversation now.http://goparallel.sourceforge.net/


                _______________________________________________
                Scikit-learn-general mailing list
                Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
                
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


                
------------------------------------------------------------------------------
                Dive into the World of Parallel Programming The Go
                Parallel Website, sponsored
                by Intel and developed in partnership with Slashdot
                Media, is your hub for all
                things parallel software development, from weekly
                thought leadership blogs to
                news, videos, case studies, tutorials and more. Take
                a look and join the
                conversation now. http://goparallel.sourceforge.net/
                _______________________________________________
                Scikit-learn-general mailing list
                Scikit-learn-general@lists.sourceforge.net
                <mailto:Scikit-learn-general@lists.sourceforge.net>
                
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general





        
------------------------------------------------------------------------------
        Dive into the World of Parallel Programming The Go Parallel Website, 
sponsored
        by Intel and developed in partnership with Slashdot Media, is your hub 
for all
        things parallel software development, from weekly thought leadership 
blogs to
        news, videos, case studies, tutorials and more. Take a look and join the
        conversation now.http://goparallel.sourceforge.net/


        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


        
------------------------------------------------------------------------------
        Dive into the World of Parallel Programming The Go Parallel
        Website, sponsored
        by Intel and developed in partnership with Slashdot Media, is
        your hub for all
        things parallel software development, from weekly thought
        leadership blogs to
        news, videos, case studies, tutorials and more. Take a look
        and join the
        conversation now. http://goparallel.sourceforge.net/
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general





------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to