Re: New features (Discretization) for v1.x in xiangrui.pdf

2014-09-03 Thread filipus
howto install? just clone by git clone https://github.com/apache/spark/pull/216 the code and than sbt package? is it the same as https://github.com/LIDIAgroup/SparkFeatureSelection ??? or something different filip -- View this message in context: http://apache-spark-user-list.1001560.n3.nabb

Re: New features (Discretization) for v1.x in xiangrui.pdf

2014-09-02 Thread filipus
i guess i found it https://github.com/LIDIAgroup/SparkFeatureSelection -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256p13261.html Sent from the Apache Spark User List mailing list archive at Nabbl

New features (Discretization) for v1.x in xiangrui.pdf

2014-09-02 Thread filipus
is there any news about Discretization in spark? is there anything on git? i didnt find yet -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256.html Sent from the Apache Spark User List mailing list a

Re: Time series forecasting

2014-09-01 Thread filipus
i guess it is not a question of spark but a question on your dataset you need to Setup think about what you wonna model and how you can shape the data in such a way spark can use it akima is a technique i know a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5} spark can finde the cofficien

numpy digitize

2014-08-31 Thread filipus
hi Folks is there a function in spark like "numpy digitize" with discretize a numerical variable or even better is there a way to use the functionality of the decission tree builder of spark mllib which splits data into bins in such a way that the splitted variable mostly predict the target valu

Re: org.apache.spark.examples.xxx

2014-08-30 Thread filipus
ok I see :-) .. instead of ~ works fine so do you know the reason sbt "run [options]" works after sbt package but spark-submit --class "ClassName" --master local[2] target/scala/JarPackage.jar [options] doesnt? it cannot resolve everything somehow -- View this message in context: htt

Re: org.apache.spark.examples.xxx

2014-08-30 Thread filipus
compilation works but execution not at least with spark-submit as I described above when I make a local copy of the training set I can execute sbt "run file" which works sbt "run sample_linear_regression_data.txt" when I do sbt "run ~/git/spark/data/mllib/sample_linear_regression_data.txt" the

Re: org.apache.spark.examples.xxx

2014-08-30 Thread filipus
i try to get use to "sbt" in order to build stnd allone application by myself the example "SimpleApp" i managed to run than i tried to copy some example scala program like "LinearRegression" in a local directory . ./build.sbt ./src ./src/main ./src/main/scala ./src/main/scala/LinearRegression.sc

org.apache.spark.examples.xxx

2014-08-28 Thread filipus
hey guys i still try to get used to compile and run the example code why does the run_example code submit the class with an org.apache.spark.examples in front of the class itself? probably a stupid question but i would be glad some one of you explains by the way.. how was the "spark...example..

Re: sbt package assembly run spark examples

2014-08-28 Thread filipus
got it when I read the class refference https://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.SparkConf conf.setMaster("local[2]") set the master to local with 2 threads but still get some warnings and the result (see below) is also not right i think ps: by the way ... first

sbt package assembly run spark examples

2014-08-28 Thread filipus
hi guys, can someone explain or give the stupid user like me a link where i can get the right usage of sbt and spark in order to run the examples as a stand alone app I got to the point running the app by sbt "run path-to-the-data" but still get some error because i probably didnt tell the app th

feature space search

2014-08-09 Thread filipus
i am wondering if i can use spark in order to search for interesting featrures/attributes for modelling. In fact I just come from some introductional sites about vowpal wabbit. i some how like the idea of out of the core modelling. well, i have transactional data where customers purchased products

Re: pmml with augustus

2014-06-12 Thread filipus
@villu: thank you for your help. In prommis I gonna try it! thats cools :-) do you know also the other way around from pmml to a model object in spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7473.html Sent from the Apache S

Re: MLLib : Decision Tree not getting built for 5 or more levels(maxDepth=5) and the one built for 3 levels is performing poorly

2014-06-11 Thread filipus
well I guess your problem is quite unbalanced and due to the information value as a splitting criterion I guess the algo stops after very view splits work arround is oversampling build many training datasets like take randomly 50% of the positives and from the negativ the same amount or let say

Re: pmml with augustus

2014-06-10 Thread filipus
@Paco: I understand that most promising for me to put effort in understanding for in deploying models in the spark enviroment would be augustus and zementis right? actually as you mention I would have both direction of deploying. I have already models which I could transform into pmml and I also t

Re: pmml with augustus

2014-06-10 Thread filipus
Thank you very much the cascading project i didn't recognize it at all till now this project is very interesting also I got the idea of the usage of scala as a language for spark - becuase i can intergrate jvm based libraries very easy/naturaly when I got it right mh... but I could also use spa

pmml with augustus

2014-06-10 Thread filipus
hello guys, has anybody experiances with the library augustus as a serializer for scoring models? looks very promising and i even found a hint on the connection augustus and spark all the best -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augu

serialization a model

2014-06-07 Thread filipus
am I right when I just use cPickle for seriailzation a model (see code below) or didnt I get it with PickleSerializer (from pyspark.serializers import PickleSerializer) ... model = LogisticRegressionWithSGD.train(parsedData) mm = open("mm.txt","wb") import cPickle cPickle.dump(model,mm) mm.clo