GitHub user harsha2010 opened a pull request:
https://github.com/apache/spark/pull/6654
[SPARK-7546][ml] An example of a complex ml pipeline
@jkbradley can you please review? I took 20newsgroups dataset as an example
for a ML pipeline. Since multiclass evaluator does not exist in ML, i dropped
cross validation out of the pipeline.
I will try to look at word2vec features and see if it improves
performance...
but for now, it would be good to know if this is in the right direction for
an example
=========== results =================
label fpr
rec.sport.hockey 0.04782490771049564
sci.space 0.034460812371756064
comp.graphics 0.0169735604941711
sci.crypt 0.06313996392002062
alt.atheism 0.02113489999158299
sci.med 0.02936611168027507
comp.windows.x 0.035558534610348656
soc.religion.christian 0.042838883951783466
talk.politics.mideast 0.07857684849506787
misc.forsale 0.02010208135697664
comp.sys.ibm.pc.hardware 0.01706653978091874
talk.religion.misc 0.004791836221086109
comp.sys.mac.hardware 0.010652479543638552
rec.sport.baseball 0.020364470446360184
rec.autos 0.015341565194014896
rec.motorcycles 0.014272856379509664
talk.politics.guns 0.05322128851540616
talk.politics.misc 0.04343890839113078
comp.os.ms-windows.misc 0.2614677840593239
sci.electronics 0.012487227356108422
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/harsha2010/spark SPARK-7546
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6654.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6654
----
commit 41bc97861166b2e07ed055fd900e48489fbfbf0b
Author: Ram Sriharsha <[email protected]>
Date: 2015-06-04T01:00:37Z
An example of a complex ml pipeline
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]