I am new to SparkStreaming, when tried to submit the Spark-Twitter streaming
job, getting the following error:
---
Lost task 0.0 in stage 0.0 (TID
0,sandbox.hortonworks.com):java.lang.NullPointerException
at org.apache.spark.util.Utils$.decodeFileNameInURI(Utils.scala:340)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:365)
at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:404)
at
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:396)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:396)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:192)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745
--

Here is the code snippet:
--
val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) =
args.take(4)
val filters = args.takeRight(args.length - 4)

System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
System.setProperty("twitter4j.oauth.accessToken", accessToken)
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret)
val sparkConf = new SparkConf().setAppName("TwitterPopularTags")
val ssc = new StreamingContext(sparkConf,  Seconds(2))
val stream = TwitterUtils.createStream(ssc,None, filters)
val hashTags = stream.flatMap(status => status.getText.split("
").filter(_.startsWith("#")))
val topCounts60 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _,
Seconds(60))
                 .map{case (topic, count) => (count, topic)}
                 .transform(_.sortByKey(false))
topCounts60.foreachRDD(rdd => {
  val topList = rdd.take(10)
  println("\nPopular topics in last 60 seconds (%s
total):".format(rdd.count()))
  topList.foreach{case (count, tag) => println("%s (%s tweets)".format(tag,
count))}
})
ssc.start()
ssc.awaitTermination()

--
command to submit the job
--
./bin/spark-submit --class
"org.apache.spark.examples.streaming.TwitterPopularTags"  --master
yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m
--executor-cores 1  --jars
/home/spark/.sbt/0.13/staging/0ff3ad537358b61f617c/twitterstream/target/scala-2.10/twitterstream-project_2.10-1.0.jar,/home/spark/.ivy2/cache/org.apache.spark/spark-streaming-twitter_2.10/jars/spark-streaming-twitter_2.10-1.6.1.jar,/home/spark/.ivy2/cache/org.twitter4j/twitter4j-core/jars/twitter4j-core-4.0.4.jar,/home/spark/.ivy2/cache/org.twitter4j/twitter4j-stream/jars/twitter4j-stream-4.0.4.jar,/home/spark/.ivy2/cache/org.apache.spark/spark-streaming_2.10/jars/spark-streaming_2.10-1.6.1.jar,/home/spark/.ivy2/cache/org.apache.spark/spark-core_2.10/jars/spark-core_2.10-1.6.1.jar
"sandbox.hortonworks.com:6667" xx xx xx xx 
--

Any clue why I am getting this NPE?? Any help on how to debug this further?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Getting-NPE-when-trying-to-do-spark-streaming-with-Twitter-tp26737.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to