GitHub user harsha2010 opened a pull request:

    https://github.com/apache/spark/pull/6654

    [SPARK-7546][ml] An example of a complex ml pipeline

    @jkbradley can you please review? I took 20newsgroups dataset as an example 
for a ML pipeline. Since multiclass evaluator does not exist in ML, i dropped 
cross validation out of the pipeline.
    I will try to look at word2vec features and see if it improves 
performance...
    but for now, it would be good to know if this is in the right direction for 
an example
    
    =========== results =================
    label       fpr
    rec.sport.hockey    0.04782490771049564
    sci.space   0.034460812371756064
    comp.graphics       0.0169735604941711
    sci.crypt   0.06313996392002062
    alt.atheism 0.02113489999158299
    sci.med     0.02936611168027507
    comp.windows.x      0.035558534610348656
    soc.religion.christian      0.042838883951783466
    talk.politics.mideast       0.07857684849506787
    misc.forsale        0.02010208135697664
    comp.sys.ibm.pc.hardware    0.01706653978091874
    talk.religion.misc  0.004791836221086109
    comp.sys.mac.hardware       0.010652479543638552
    rec.sport.baseball  0.020364470446360184
    rec.autos   0.015341565194014896
    rec.motorcycles     0.014272856379509664
    talk.politics.guns  0.05322128851540616
    talk.politics.misc  0.04343890839113078
    comp.os.ms-windows.misc     0.2614677840593239
    sci.electronics     0.012487227356108422

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harsha2010/spark SPARK-7546

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6654
    
----
commit 41bc97861166b2e07ed055fd900e48489fbfbf0b
Author: Ram Sriharsha <[email protected]>
Date:   2015-06-04T01:00:37Z

    An example of a complex ml pipeline

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to