Re: Tree classifiers in MLib

2013-12-29 Thread Evan Sparks
Yes - Manish Amde and Hirakendu Das have been working on a distributed tree 
classifier. We are taking the current version through large scale testing and 
expect to merge it into the master branch soon. I expect that ensembled tree 
learned (random forests, GBDTs) will follow shortly. 

 On Dec 29, 2013, at 10:35 AM, Charles Earl charles.ce...@gmail.com wrote:
 
 In the latest API docs off of the web page
 http://spark.incubator.apache.org/docs/latest/api/mllib/index.html#org.apache.spark.mllib.package
 I had not seen tree classifiers included.
 Are there plans to include decision trees etc at some point. Is there an 
 interest?
 
 
 -- 
 - Charles


Re: Tree classifiers in MLib

2013-12-29 Thread Debasish Das
Hi Evan,

Could you please point to the git repo for the decision tree classifier or
the enhancement JIRA ?

Thanks.
Deb
 On Dec 29, 2013 8:55 AM, Evan Sparks evan.spa...@gmail.com wrote:

 Yes - Manish Amde and Hirakendu Das have been working on a distributed
 tree classifier. We are taking the current version through large scale
 testing and expect to merge it into the master branch soon. I expect that
 ensembled tree learned (random forests, GBDTs) will follow shortly.

 On Dec 29, 2013, at 10:35 AM, Charles Earl charles.ce...@gmail.com
 wrote:

 In the latest API docs off of the web page

 http://spark.incubator.apache.org/docs/latest/api/mllib/index.html#org.apache.spark.mllib.package
 I had not seen tree classifiers included.
 Are there plans to include decision trees etc at some point. Is there an
 interest?


 --
 - Charles




RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0

2013-12-29 Thread Liu, Raymond
Hi Izhar

Is that the exact command you are running? Say with 0.8.0 instead of 
0.8.1 in the cmd?

Raymond Liu

From: Izhar ul Hassan [mailto:ezh...@gmail.com] 
Sent: Friday, December 27, 2013 9:40 PM
To: user@spark.incubator.apache.org
Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0

Hi,
I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed 
spark-0.8.1 with support for spark enabled. I get the following errors when 
trying to run the examples:
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar
 \
./spark-class org.apache.spark.deploy.yarn.Client \
  --jar 
examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \
  --class org.apache.spark.examples.SparkPi \
  --args yarn-standalone \
  --num-workers 3 \
  --master-memory 4g \
  --worker-memory 2g \
  --worker-cores 1

Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/deploy/yarn/Client
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.deploy.yarn.Client. Program 
will exit.
spark-0.8.0 with hadooop 2.0.5-alpha works fine.
-- 
/Izhar 


Re: Errors with spark-0.8.1 hadoop-yarn 2.2.0

2013-12-29 Thread Azuryy Yu
Maybe you don't enabled Yarn when build Spark.


On Mon, Dec 30, 2013 at 8:36 AM, Liu, Raymond raymond@intel.com wrote:

 Hi Izhar

 Is that the exact command you are running? Say with 0.8.0 instead
 of 0.8.1 in the cmd?

 Raymond Liu

 From: Izhar ul Hassan [mailto:ezh...@gmail.com]
 Sent: Friday, December 27, 2013 9:40 PM
 To: user@spark.incubator.apache.org
 Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0

 Hi,
 I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed
 spark-0.8.1 with support for spark enabled. I get the following errors when
 trying to run the examples:
 SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar
 \
 ./spark-class org.apache.spark.deploy.yarn.Client \
   --jar
 examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \
   --class org.apache.spark.examples.SparkPi \
   --args yarn-standalone \
   --num-workers 3 \
   --master-memory 4g \
   --worker-memory 2g \
   --worker-cores 1

 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/spark/deploy/yarn/Client
 Caused by: java.lang.ClassNotFoundException:
 org.apache.spark.deploy.yarn.Client
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class: org.apache.spark.deploy.yarn.Client.
 Program will exit.
 spark-0.8.0 with hadooop 2.0.5-alpha works fine.
 --
 /Izhar



Re: RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0

2013-12-29 Thread leosand...@gmail.com
What is your classpath ?
Had you builded your spark when you changed the new version and with yarn?
Have you find your jar under the $SPARK_HOME/assembly/target/scala-2.9.3  ? or 
there is not just only one ?




leosand...@gmail.com

From: Liu, Raymond
Date: 2013-12-30 08:36
To: user@spark.incubator.apache.org
Subject: RE: Errors with spark-0.8.1 hadoop-yarn 2.2.0
Hi Izhar

Is that the exact command you are running? Say with 0.8.0 instead of 0.8.1 in 
the cmd?

Raymond Liu

From: Izhar ul Hassan [mailto:ezh...@gmail.com] 
Sent: Friday, December 27, 2013 9:40 PM
To: user@spark.incubator.apache.org
Subject: Errors with spark-0.8.1 hadoop-yarn 2.2.0

Hi,
I have a 3 node installation of hadoop 2.2.0 with yarn. I have installed 
spark-0.8.1 with support for spark enabled. I get the following errors when 
trying to run the examples:
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop2.0.5-alpha.jar
 \
./spark-class org.apache.spark.deploy.yarn.Client \
  --jar 
examples/target/scala-2.9.3/spark-examples-assembly-0.8.0-incubating.jar \
  --class org.apache.spark.examples.SparkPi \
  --args yarn-standalone \
  --num-workers 3 \
  --master-memory 4g \
  --worker-memory 2g \
  --worker-cores 1

Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/spark/deploy/yarn/Client
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: org.apache.spark.deploy.yarn.Client. Program 
will exit.
spark-0.8.0 with hadooop 2.0.5-alpha works fine.
-- 
/Izhar 

Re: closure and ExceptionInInitializerError

2013-12-29 Thread Bao
Redocpot, I tried your 2 snippets with spark-shell and both work fine. I only
see problem if closure is not serializeable.

scala val rdd1 = sc.parallelize(List(1, 2, 3, 4)) 
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at
parallelize at console:12

scala val a = 1   
a: Int = 1

scala val rdd2 = rdd1.map(_ + a) 
rdd2: org.apache.spark.rdd.RDD[Int] = MappedRDD[5] at map at console:16

scala rdd2.count
13/12/30 03:50:59 INFO SparkContext: Starting job: count at console:19
13/12/30 03:50:59 INFO DAGScheduler: Got job 4 (count at console:19) with
2 output partitions (allowLocal=false)
13/12/30 03:50:59 INFO DAGScheduler: Final stage: Stage 4 (count at
console:19)
13/12/30 03:50:59 INFO DAGScheduler: Parents of final stage: List()
13/12/30 03:50:59 INFO DAGScheduler: Missing parents: List()
13/12/30 03:50:59 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[5] at map
at console:16), which has no missing parents
13/12/30 03:50:59 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4
(MappedRDD[5] at map at console:16)
13/12/30 03:50:59 INFO ClusterScheduler: Adding task set 4.0 with 2 tasks
13/12/30 03:50:59 INFO ClusterTaskSetManager: Starting task 4.0:0 as TID 8
on executor 0: worker1 (PROCESS_LOCAL)
13/12/30 03:50:59 INFO ClusterTaskSetManager: Serialized task 4.0:0 as 1839
bytes in 1 ms
13/12/30 03:50:59 INFO ClusterTaskSetManager: Starting task 4.0:1 as TID 9
on executor 1: worker2 (PROCESS_LOCAL)
13/12/30 03:50:59 INFO ClusterTaskSetManager: Serialized task 4.0:1 as 1839
bytes in 1 ms
13/12/30 03:51:00 INFO ClusterTaskSetManager: Finished TID 8 in 152 ms on
worker1 (progress: 1/2)
13/12/30 03:51:00 INFO DAGScheduler: Completed ResultTask(4, 0)
13/12/30 03:51:00 INFO ClusterTaskSetManager: Finished TID 9 in 171 ms on
worker2 (progress: 2/2)
13/12/30 03:51:00 INFO ClusterScheduler: Remove TaskSet 4.0 from pool 
13/12/30 03:51:00 INFO DAGScheduler: Completed ResultTask(4, 1)
13/12/30 03:51:00 INFO DAGScheduler: Stage 4 (count at console:19)
finished in 0.131 s
13/12/30 03:51:00 INFO SparkContext: Job finished: count at console:19,
took 0.212351498 s
res5: Long = 4




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/closure-and-ExceptionInInitializerError-tp77p98.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.