Re: New features (Discretization) for v1.x in xiangrui.pdf
howto install? just clone by git clone https://github.com/apache/spark/pull/216 the code and than sbt package? is it the same as https://github.com/LIDIAgroup/SparkFeatureSelection ??? or something different filip -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256p13338.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
New features (Discretization) for v1.x in xiangrui.pdf
is there any news about Discretization in spark? is there anything on git? i didnt find yet -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: New features (Discretization) for v1.x in xiangrui.pdf
i guess i found it https://github.com/LIDIAgroup/SparkFeatureSelection -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/New-features-Discretization-for-v1-x-in-xiangrui-pdf-tp13256p13261.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Time series forecasting
i guess it is not a question of spark but a question on your dataset you need to Setup think about what you wonna model and how you can shape the data in such a way spark can use it akima is a technique i know a_{t+1} = C1 * a_{t} + C2* a_{t-1} + ... + C6 * a_{t-5} spark can finde the cofficients C1-C6 by regregression I guess -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Time-series-forecasting-tp13236p13239.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
numpy digitize
hi Folks is there a function in spark like numpy digitize with discretize a numerical variable or even better is there a way to use the functionality of the decission tree builder of spark mllib which splits data into bins in such a way that the splitted variable mostly predict the target value (Label) could be useful for logistic Regression because it (linearization) makes models kind of stable in a way some People would refer it to weight of evidence modeling -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/numpy-digitize-tp13212.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: org.apache.spark.examples.xxx
i try to get use to sbt in order to build stnd allone application by myself the example SimpleApp i managed to run than i tried to copy some example scala program like LinearRegression in a local directory . ./build.sbt ./src ./src/main ./src/main/scala ./src/main/scala/LinearRegression.scala my build.sbt even when I dont know what I do looks like name := Linear Regression version := 1.0 scalaVersion := 2.10.4 libraryDependencies += org.apache.spark %% spark-core % 1.0.2 libraryDependencies += org.apache.spark %% spark-mllib % 1.0.2 libraryDependencies += com.github.scopt %% scopt % 3.2.0 resolvers += Akka Repository at http://repo.akka.io/releases/; By the way... first i tried scalaVersion := 2.11.2 which is my installed version. but this faild ... sbt package builds a jar file in target but the command spark-submit --class LinearRegression --master local[2] target/scala-2.10/linear-regression_2.10-1.0.jar ~/git/spark/data/mllib/sample_linear_regression_data.txt didnt work. it tells me Spark assembly has been built with Hive, including Datanucleus jars on classpath Exception in thread main java.lang.NoClassDefFoundError: scopt/OptionParser at java.lang.Class.getDeclaredMethods0(Native Method) AHHH: I comented /*package org.apache.spark.examples.mllib*/ in LinearRegression.scala because otherwise it doesnt find the main class Exception in thread main java.lang.ClassNotFoundException: LinearRegression when I does the same with the pre build jar package of examples everything works fine spark-submit --class org.apache.spark.examples.mllib.LinearRegression --master local[2] lib/spark-examples-1.0.2-hadoop2.2.0.jar ~/git/spark/data/mllib/sample_linear_regression_data.txt works !!! spark-submit --class org.apache.spark.examples.mllib.LinearRegression --master local[2] lib/spark-examples-1.0.2-hadoop2.2.0.jar ~/git/spark/data/mllib/sample_linear_regression_data.txt -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052p13178.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: org.apache.spark.examples.xxx
compilation works but execution not at least with spark-submit as I described above when I make a local copy of the training set I can execute sbt run file which works sbt run sample_linear_regression_data.txt when I do sbt run ~/git/spark/data/mllib/sample_linear_regression_data.txt the program fails because it doesnt find any traning set at [error] (run-main-0) org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:*/home/filip/spark-ex-regression/*~/git/spark/data/mllib/sample_linear_regression_data.txt org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/filip/spark-ex-regression/~/git/spark/data/mllib/sample_linear_regression_data.txt ps: does anybody knows where in the program LinearRegression.scala it specifies the PATH or has it to do with sbt??? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052p13180.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: org.apache.spark.examples.xxx
ok I see :-) .. instead of ~ works fine so do you know the reason sbt run [options] works after sbt package but spark-submit --class ClassName --master local[2] target/scala/JarPackage.jar [options] doesnt? it cannot resolve everything somehow -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052p13182.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
sbt package assembly run spark examples
hi guys, can someone explain or give the stupid user like me a link where i can get the right usage of sbt and spark in order to run the examples as a stand alone app I got to the point running the app by sbt run path-to-the-data but still get some error because i probably didnt tell the app that the master is local (--master local) in the SparkContext method in the BinaryClassification.scala programm it is set by val conf = new SparkConf().setAppName(sBinaryClassification with $params) so... how to adapt the code in the docu it is written val sc = new SparkContext(local, Simple App, YOUR_SPARK_HOME, List(target/scala-2.10/simple-project_2.10-1.0.jar)) I got the following error sbt run ~/git/spark/data/mllib/sample_binary_classification_data.txt [info] Set current project to Simple Project (in build file:/home/filip/spark-sample/) [info] Running BinaryClassification ~/git/spark/data/mllib/sample_binary_classification_data.txt [error] (run-main-0) org.apache.spark.SparkException: A master URL must be set in your configuration org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.init(SparkContext.scala:166) at BinaryClassification$.run(BinaryClassification.scala:107) at BinaryClassification$$anonfun$main$1.apply(BinaryClassification.scala:99) at BinaryClassification$$anonfun$main$1.apply(BinaryClassification.scala:98) at scala.Option.map(Option.scala:145) at BinaryClassification$.main(BinaryClassification.scala:98) at BinaryClassification.main(BinaryClassification.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) [trace] Stack trace suppressed: run last compile:run for the full output. java.lang.RuntimeException: Nonzero exit code: 1 at scala.sys.package$.error(package.scala:27) [trace] Stack trace suppressed: run last compile:run for the full output. [error] (compile:run) Nonzero exit code: 1 [error] Total time: 7 s, completed Aug 28, 2014 11:04:44 AM -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sbt-package-assembly-run-spark-examples-tp13000.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: sbt package assembly run spark examples
got it when I read the class refference https://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.SparkConf conf.setMaster(local[2]) set the master to local with 2 threads but still get some warnings and the result (see below) is also not right i think ps: by the way ... first i run like this sbt run ~/git/spark/data/mllib/sample_binary_classification_data.txt but the app didnt find the file because it startet at the local directory an pointed to /home/filip/spark-sample/~/git/spark/data/mllib/sample_binary_classification_data.txt any explanation??? ps2: i guess many of us from the user side have problems with scala sbt and the class lib. has any of you a suggestion how i can overcome this. it is pretty time consuming trial and error :-( 14/08/28 11:47:37 INFO ui.SparkUI: Started SparkUI at http://filip-VirtualBox.localdomain:4040 14/08/28 11:47:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/08/28 11:47:41 WARN snappy.LoadSnappy: Snappy native library not loaded Training: 84, test: 16. 14/08/28 11:47:47 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 14/08/28 11:47:47 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS Test areaUnderPR = 1.0. Test areaUnderROC = 1.0. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sbt-package-assembly-run-spark-examples-tp13000p13001.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
org.apache.spark.examples.xxx
hey guys i still try to get used to compile and run the example code why does the run_example code submit the class with an org.apache.spark.examples in front of the class itself? probably a stupid question but i would be glad some one of you explains by the way.. how was the spark...example...jar file build? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-spark-examples-xxx-tp13052.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
feature space search
i am wondering if i can use spark in order to search for interesting featrures/attributes for modelling. In fact I just come from some introductional sites about vowpal wabbit. i some how like the idea of out of the core modelling. well, i have transactional data where customers purchased products with unique article numbers and a huge table of customers treatments like coupons, prospects and so on. each articles has some properties or attribute wich is written in another table. in fact i would like to join the tables on the fly as in input for a mashine learning platform like spark or vowpal wabbit just to get a ranking of good attribte for modelling. does somebody knows how to do it? all the best :-) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/feature-space-search-tp11838.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pmml with augustus
@villu: thank you for your help. In prommis I gonna try it! thats cools :-) do you know also the other way around from pmml to a model object in spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7473.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
pmml with augustus
hello guys, has anybody experiances with the library augustus as a serializer for scoring models? looks very promising and i even found a hint on the connection augustus and spark all the best -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313.html Sent from the Apache Spark User List mailing list archive at Nabble.com.