Re: How to use FlumeInputDStream in spark cluster?

Ravi Hemnani Fri, 21 Mar 2014 05:32:03 -0700

Hey,


Even i am getting the same error. 

I am running, 

sudo ./run-example org.apache.spark.streaming.examples.FlumeEventCount
spark://<spark_master_hostname>:7077 <spark_master_hostname> 7781

and getting no events in the spark streaming. 

-------------------------------------------
Time: 1395395676000 ms
-------------------------------------------
Received 0 flume events.

14/03/21 09:54:36 INFO JobScheduler: Finished job streaming job
1395395676000 ms.0 from job set of time 1395395676000 ms
14/03/21 09:54:36 INFO JobScheduler: Total delay: 0.196 s for time
1395395676000 ms (execution: 0.111 s)
14/03/21 09:54:38 INFO NetworkInputTracker: Stream 0 received 0 blocks
14/03/21 09:54:38 INFO SparkContext: Starting job: take at DStream.scala:586
14/03/21 09:54:38 INFO JobScheduler: Starting job streaming job
1395395678000 ms.0 from job set of time 1395395678000 ms
14/03/21 09:54:38 INFO DAGScheduler: Registering RDD 73 (combineByKey at
ShuffledDStream.scala:42)
14/03/21 09:54:38 INFO DAGScheduler: Got job 16 (take at DStream.scala:586)
with 1 output partitions (allowLocal=true)
14/03/21 09:54:38 INFO DAGScheduler: Final stage: Stage 31 (take at
DStream.scala:586)
14/03/21 09:54:38 INFO DAGScheduler: Parents of final stage: List(Stage 32)
14/03/21 09:54:38 INFO JobScheduler: Added jobs for time 1395395678000 ms
14/03/21 09:54:38 INFO DAGScheduler: Missing parents: List(Stage 32)
14/03/21 09:54:38 INFO DAGScheduler: Submitting Stage 32
(MapPartitionsRDD[73] at combineByKey at ShuffledDStream.scala:42), which
has no missing parents
14/03/21 09:54:38 INFO DAGScheduler: Submitting 1 missing tasks from Stage
32 (MapPartitionsRDD[73] at combineByKey at ShuffledDStream.scala:42)
14/03/21 09:54:38 INFO TaskSchedulerImpl: Adding task set 32.0 with 1 tasks
14/03/21 09:54:38 INFO TaskSetManager: Starting task 32.0:0 as TID 92 on
executor 2: c8-data-store-4.srv.media.net (PROCESS_LOCAL)
14/03/21 09:54:38 INFO TaskSetManager: Serialized task 32.0:0 as 2971 bytes
in 1 ms
14/03/21 09:54:38 INFO TaskSetManager: Finished TID 92 in 41 ms on
c8-data-store-4.srv.media.net (progress: 0/1)
14/03/21 09:54:38 INFO TaskSchedulerImpl: Remove TaskSet 32.0 from pool 



Also on closer look, i got 

INFO SparkContext: Job finished: runJob at NetworkInputTracker.scala:182,
took 0.523621327 s
14/03/21 09:54:35 ERROR NetworkInputTracker: De-registered receiver for
network stream 0 with message org.jboss.netty.channel.ChannelException:
Failed to bind to: c8-data-store-1.srv.media.net/172.16.200.124:7781


I couldnt understand the NetworkInputTracker that you told about. Can you
elaborate that? 

I only understood that the master checks any one of the workers nodes for
the connection and stays on it till the program runs. Why is it not checking
on the <host> and <port> i am providing. Also, <host> and <port> should
necessarily any worker node? 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p2987.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to use FlumeInputDStream in spark cluster?

Reply via email to