Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-09 Thread Xiangrui Meng
This is worker log, not executor log. The executor log can be found in
folders like /newdisk2/rta/rtauser/workerdir/app-20150109182514-0001/0/
. -Xiangrui

On Fri, Jan 9, 2015 at 5:03 AM, Priya Ch  wrote:
> Please find the attached worker log.
>  I could see stream closed exception
>
> On Wed, Jan 7, 2015 at 10:51 AM, Xiangrui Meng  wrote:
>>
>> Could you attach the executor log? That may help identify the root
>> cause. -Xiangrui
>>
>> On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch 
>> wrote:
>> > Hi All,
>> >
>> > Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in
>> > local mode and not on distributed mode. Null pointer exception has been
>> > thrown. Is this a bug in spark-1.1.0 ?
>> >
>> > Following is the code:
>> >   def main(args:Array[String])
>> >   {
>> >  val conf=new SparkConf
>> >  val sc=new SparkContext(conf)
>> >  val
>> >
>> > documents=sc.textFile("hdfs://IMPETUS-DSRV02:9000/nlp/sampletext").map(_.split("
>> > ").toSeq)
>> >  val hashingTF = new HashingTF()
>> >  val tf= hashingTF.transform(documents)
>> >  tf.cache()
>> > val idf = new IDF().fit(tf)
>> > val tfidf = idf.transform(tf)
>> >  val rdd=tfidf.map { vec => println("vector is"+vec)
>> > (10)
>> >}
>> >  rdd.saveAsTextFile("/home/padma/usecase")
>> >
>> >   }
>> >
>> >
>> >
>> >
>> > Exception thrown:
>> >
>> > 15/01/06 12:36:09 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
>> > with
>> > 2 tasks
>> > 15/01/06 12:36:10 INFO cluster.SparkDeploySchedulerBackend: Registered
>> > executor:
>> >
>> > Actor[akka.tcp://sparkexecu...@impetus-dsrv05.impetus.co.in:33898/user/Executor#-1525890167]
>> > with ID 0
>> > 15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 0.0 in
>> > stage
>> > 0.0 (TID 0, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
>> > 15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 1.0 in
>> > stage
>> > 0.0 (TID 1, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
>> > 15/01/06 12:36:10 INFO storage.BlockManagerMasterActor: Registering
>> > block
>> > manager IMPETUS-DSRV05.impetus.co.in:35130 with 2.1 GB RAM
>> > 15/01/06 12:36:12 INFO network.ConnectionManager: Accepted connection
>> > from
>> > [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:46888]
>> > 15/01/06 12:36:12 INFO network.SendingConnection: Initiating connection
>> > to
>> > [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130]
>> > 15/01/06 12:36:12 INFO network.SendingConnection: Connected to
>> > [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130], 1 messages pending
>> > 15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added
>> > broadcast_1_piece0 in
>> > memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 2.1 KB, free: 2.1
>> > GB)
>> > 15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added
>> > broadcast_0_piece0 in
>> > memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 10.1 KB, free: 2.1
>> > GB)
>> > 15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory
>> > on
>> > IMPETUS-DSRV05.impetus.co.in:35130 (size: 280.0 B, free: 2.1 GB)
>> > 15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory
>> > on
>> > IMPETUS-DSRV05.impetus.co.in:35130 (size: 416.0 B, free: 2.1 GB)
>> > 15/01/06 12:36:13 WARN scheduler.TaskSetManager: Lost task 1.0 in stage
>> > 0.0
>> > (TID 1, IMPETUS-DSRV05.impetus.co.in): java.lang.NullPointerException:
>> > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>> > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>> >
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> > org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> >
>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>> > org.apache.spark.scheduler.Task.run(Task.scala:54)
>> >
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>> >
>> >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > java.lang.Thread.run(Thread.java:722)
>> >
>> >
>> > Thanks,
>> > Padma Ch
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-09 Thread Priya Ch
Please find the attached worker log.
 I could see stream closed exception

On Wed, Jan 7, 2015 at 10:51 AM, Xiangrui Meng  wrote:

> Could you attach the executor log? That may help identify the root
> cause. -Xiangrui
>
> On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch 
> wrote:
> > Hi All,
> >
> > Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in
> > local mode and not on distributed mode. Null pointer exception has been
> > thrown. Is this a bug in spark-1.1.0 ?
> >
> > Following is the code:
> >   def main(args:Array[String])
> >   {
> >  val conf=new SparkConf
> >  val sc=new SparkContext(conf)
> >  val
> >
> documents=sc.textFile("hdfs://IMPETUS-DSRV02:9000/nlp/sampletext").map(_.split("
> > ").toSeq)
> >  val hashingTF = new HashingTF()
> >  val tf= hashingTF.transform(documents)
> >  tf.cache()
> > val idf = new IDF().fit(tf)
> > val tfidf = idf.transform(tf)
> >  val rdd=tfidf.map { vec => println("vector is"+vec)
> > (10)
> >}
> >  rdd.saveAsTextFile("/home/padma/usecase")
> >
> >   }
> >
> >
> >
> >
> > Exception thrown:
> >
> > 15/01/06 12:36:09 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
> with
> > 2 tasks
> > 15/01/06 12:36:10 INFO cluster.SparkDeploySchedulerBackend: Registered
> > executor:
> > Actor[akka.tcp://
> sparkexecu...@impetus-dsrv05.impetus.co.in:33898/user/Executor#-1525890167
> ]
> > with ID 0
> > 15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 0.0 in
> stage
> > 0.0 (TID 0, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
> > 15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 1.0 in
> stage
> > 0.0 (TID 1, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
> > 15/01/06 12:36:10 INFO storage.BlockManagerMasterActor: Registering block
> > manager IMPETUS-DSRV05.impetus.co.in:35130 with 2.1 GB RAM
> > 15/01/06 12:36:12 INFO network.ConnectionManager: Accepted connection
> from
> > [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:46888]
> > 15/01/06 12:36:12 INFO network.SendingConnection: Initiating connection
> to
> > [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130]
> > 15/01/06 12:36:12 INFO network.SendingConnection: Connected to
> > [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130], 1 messages pending
> > 15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added
> broadcast_1_piece0 in
> > memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 2.1 KB, free: 2.1
> GB)
> > 15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added
> broadcast_0_piece0 in
> > memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 10.1 KB, free: 2.1
> GB)
> > 15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory
> on
> > IMPETUS-DSRV05.impetus.co.in:35130 (size: 280.0 B, free: 2.1 GB)
> > 15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory
> on
> > IMPETUS-DSRV05.impetus.co.in:35130 (size: 416.0 B, free: 2.1 GB)
> > 15/01/06 12:36:13 WARN scheduler.TaskSetManager: Lost task 1.0 in stage
> 0.0
> > (TID 1, IMPETUS-DSRV05.impetus.co.in): java.lang.NullPointerException:
> > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> >
> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> > org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> >
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> > org.apache.spark.scheduler.Task.run(Task.scala:54)
> >
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > java.lang.Thread.run(Thread.java:722)
> >
> >
> > Thanks,
> > Padma Ch
>


spark-rtauser-org.apache.spark.deploy.worker.Worker-1-IMPETUS-DSRV02.out
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: TF-IDF from spark-1.1.0 not working on cluster mode

2015-01-06 Thread Xiangrui Meng
Could you attach the executor log? That may help identify the root
cause. -Xiangrui

On Mon, Jan 5, 2015 at 11:12 PM, Priya Ch  wrote:
> Hi All,
>
> Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in
> local mode and not on distributed mode. Null pointer exception has been
> thrown. Is this a bug in spark-1.1.0 ?
>
> Following is the code:
>   def main(args:Array[String])
>   {
>  val conf=new SparkConf
>  val sc=new SparkContext(conf)
>  val
> documents=sc.textFile("hdfs://IMPETUS-DSRV02:9000/nlp/sampletext").map(_.split("
> ").toSeq)
>  val hashingTF = new HashingTF()
>  val tf= hashingTF.transform(documents)
>  tf.cache()
> val idf = new IDF().fit(tf)
> val tfidf = idf.transform(tf)
>  val rdd=tfidf.map { vec => println("vector is"+vec)
> (10)
>}
>  rdd.saveAsTextFile("/home/padma/usecase")
>
>   }
>
>
>
>
> Exception thrown:
>
> 15/01/06 12:36:09 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with
> 2 tasks
> 15/01/06 12:36:10 INFO cluster.SparkDeploySchedulerBackend: Registered
> executor:
> Actor[akka.tcp://sparkexecu...@impetus-dsrv05.impetus.co.in:33898/user/Executor#-1525890167]
> with ID 0
> 15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
> 0.0 (TID 0, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
> 15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
> 0.0 (TID 1, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
> 15/01/06 12:36:10 INFO storage.BlockManagerMasterActor: Registering block
> manager IMPETUS-DSRV05.impetus.co.in:35130 with 2.1 GB RAM
> 15/01/06 12:36:12 INFO network.ConnectionManager: Accepted connection from
> [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:46888]
> 15/01/06 12:36:12 INFO network.SendingConnection: Initiating connection to
> [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130]
> 15/01/06 12:36:12 INFO network.SendingConnection: Connected to
> [IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130], 1 messages pending
> 15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in
> memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 2.1 KB, free: 2.1 GB)
> 15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in
> memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 10.1 KB, free: 2.1 GB)
> 15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on
> IMPETUS-DSRV05.impetus.co.in:35130 (size: 280.0 B, free: 2.1 GB)
> 15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on
> IMPETUS-DSRV05.impetus.co.in:35130 (size: 416.0 B, free: 2.1 GB)
> 15/01/06 12:36:13 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0
> (TID 1, IMPETUS-DSRV05.impetus.co.in): java.lang.NullPointerException:
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> java.lang.Thread.run(Thread.java:722)
>
>
> Thanks,
> Padma Ch

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org