Re: Tree classifiers in MLib
Yes - Manish Amde and Hirakendu Das have been working on a distributed tree classifier. We are taking the current version through large scale testing and expect to merge it into the master branch soon. I expect that ensembled tree learned (random forests, GBDTs) will follow shortly. On Dec 29, 2013, at 10:35 AM, Charles Earl charles.ce...@gmail.com wrote: In the latest API docs off of the web page http://spark.incubator.apache.org/docs/latest/api/mllib/index.html#org.apache.spark.mllib.package I had not seen tree classifiers included. Are there plans to include decision trees etc at some point. Is there an interest? -- - Charles
Re: Tree classifiers in MLib
Hi Evan, Could you please point to the git repo for the decision tree classifier or the enhancement JIRA ? Thanks. Deb On Dec 29, 2013 8:55 AM, Evan Sparks evan.spa...@gmail.com wrote: Yes - Manish Amde and Hirakendu Das have been working on a distributed tree classifier. We are taking the current version through large scale testing and expect to merge it into the master branch soon. I expect that ensembled tree learned (random forests, GBDTs) will follow shortly. On Dec 29, 2013, at 10:35 AM, Charles Earl charles.ce...@gmail.com wrote: In the latest API docs off of the web page http://spark.incubator.apache.org/docs/latest/api/mllib/index.html#org.apache.spark.mllib.package I had not seen tree classifiers included. Are there plans to include decision trees etc at some point. Is there an interest? -- - Charles
RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0
Hi Izhar Is that the exact command you are running? Say with 0.8.0 instead of 0.8.1 in the cmd? Raymond Liu From: Izhar ul Hassan [mailto:ezh...@gmail.com] Sent: Friday, December 27, 2013 9:40 PM To: user@spark.incubator.apache.org Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0 Hi, I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed spark-0.8.1 with support for spark enabled. I get the following errors when trying to run the examples: SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar \ ./spark-class org.apache.spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ --master-memory 4g \ --worker-memory 2g \ --worker-cores 1 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/Client Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.deploy.yarn.Client. Program will exit. spark-0.8.0 with hadooop 2.0.5-alpha works fine. -- /Izhar
Re: Errors with spark-0.8.1 hadoop-yarn 2.2.0
Maybe you don't enabled Yarn when build Spark. On Mon, Dec 30, 2013 at 8:36 AM, Liu, Raymond raymond@intel.com wrote: Hi Izhar Is that the exact command you are running? Say with 0.8.0 instead of 0.8.1 in the cmd? Raymond Liu From: Izhar ul Hassan [mailto:ezh...@gmail.com] Sent: Friday, December 27, 2013 9:40 PM To: user@spark.incubator.apache.org Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0 Hi, I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed spark-0.8.1 with support for spark enabled. I get the following errors when trying to run the examples: SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar \ ./spark-class org.apache.spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ --master-memory 4g \ --worker-memory 2g \ --worker-cores 1 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/Client Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.deploy.yarn.Client. Program will exit. spark-0.8.0 with hadooop 2.0.5-alpha works fine. -- /Izhar
Re: RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0
What is your classpath ? Had you builded your spark when you changed the new version and with yarn? Have you find your jar under the $SPARK_HOME/assembly/target/scala-2.9.3 ? or there is not just only one ? leosand...@gmail.com From: Liu, Raymond Date: 2013-12-30 08:36 To: user@spark.incubator.apache.org Subject: RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0 Hi Izhar Is that the exact command you are running? Say with 0.8.0 instead of 0.8.1 in the cmd? Raymond Liu From: Izhar ul Hassan [mailto:ezh...@gmail.com] Sent: Friday, December 27, 2013 9:40 PM To: user@spark.incubator.apache.org Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0 Hi, I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed spark-0.8.1 with support for spark enabled. I get the following errors when trying to run the examples: SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar \ ./spark-class org.apache.spark.deploy.yarn.Client \ --jar examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \ --class org.apache.spark.examples.SparkPi \ --args yarn-standalone \ --num-workers 3 \ --master-memory 4g \ --worker-memory 2g \ --worker-cores 1 Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/Client Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) Could not find the main class: org.apache.spark.deploy.yarn.Client. Program will exit. spark-0.8.0 with hadooop 2.0.5-alpha works fine. -- /Izhar
Re: closure and ExceptionInInitializerError
Redocpot, I tried your 2 snippets with spark-shell and both work fine. I only see problem if closure is not serializeable. scala val rdd1 = sc.parallelize(List(1, 2, 3, 4)) rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at console:12 scala val a = 1 a: Int = 1 scala val rdd2 = rdd1.map(_ + a) rdd2: org.apache.spark.rdd.RDD[Int] = MappedRDD[5] at map at console:16 scala rdd2.count 13/12/30 03:50:59 INFO SparkContext: Starting job: count at console:19 13/12/30 03:50:59 INFO DAGScheduler: Got job 4 (count at console:19) with 2 output partitions (allowLocal=false) 13/12/30 03:50:59 INFO DAGScheduler: Final stage: Stage 4 (count at console:19) 13/12/30 03:50:59 INFO DAGScheduler: Parents of final stage: List() 13/12/30 03:50:59 INFO DAGScheduler: Missing parents: List() 13/12/30 03:50:59 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[5] at map at console:16), which has no missing parents 13/12/30 03:50:59 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[5] at map at console:16) 13/12/30 03:50:59 INFO ClusterScheduler: Adding task set 4.0 with 2 tasks 13/12/30 03:50:59 INFO ClusterTaskSetManager: Starting task 4.0:0 as TID 8 on executor 0: worker1 (PROCESS_LOCAL) 13/12/30 03:50:59 INFO ClusterTaskSetManager: Serialized task 4.0:0 as 1839 bytes in 1 ms 13/12/30 03:50:59 INFO ClusterTaskSetManager: Starting task 4.0:1 as TID 9 on executor 1: worker2 (PROCESS_LOCAL) 13/12/30 03:50:59 INFO ClusterTaskSetManager: Serialized task 4.0:1 as 1839 bytes in 1 ms 13/12/30 03:51:00 INFO ClusterTaskSetManager: Finished TID 8 in 152 ms on worker1 (progress: 1/2) 13/12/30 03:51:00 INFO DAGScheduler: Completed ResultTask(4, 0) 13/12/30 03:51:00 INFO ClusterTaskSetManager: Finished TID 9 in 171 ms on worker2 (progress: 2/2) 13/12/30 03:51:00 INFO ClusterScheduler: Remove TaskSet 4.0 from pool 13/12/30 03:51:00 INFO DAGScheduler: Completed ResultTask(4, 1) 13/12/30 03:51:00 INFO DAGScheduler: Stage 4 (count at console:19) finished in 0.131 s 13/12/30 03:51:00 INFO SparkContext: Job finished: count at console:19, took 0.212351498 s res5: Long = 4 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/closure-and-ExceptionInInitializerError-tp77p98.html Sent from the Apache Spark User List mailing list archive at Nabble.com.