Got it to work...thanks a lot for the help! I started a new cluster where
Spark has Yarn as a dependency. I ran it with the script with local[2] and
it worked (this same script did not work with Spark in standalone mode).

A follow up question...I have seen this question posted around the internet
quite a few times, but very few people have received responses...

Instead of wordCounts.print() I want to output it to a text file or hadoop
file. The link below says that saveAsTextFiles or saveAsHadoopFiles are
appropriate output commands.

https://spark.apache.org/docs/1.1.0/streaming-programming-guide.html

However, when I try saveAsTextFiles("prefix", "txt") the package fails to
build saying it doesn't recognize the command.

When I try saveAsHadoopFiles("hdfs://ip-10...:8020/user/test/", "abc") it
builds, but throws a runtime exception:

Exception in thread "main" java.lang.RuntimeException:
java.lang.RuntimeException: class scala.runtime.Nothing$ not
org.apache.hadoop.mapred.OutputFormat
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2079)
        at
org.apache.hadoop.mapred.JobConf.getOutputFormat(JobConf.java:712)
        at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1021)
        at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:940)
        at
org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$8.apply(PairDStreamFunctions.scala:632)
        at
org.apache.spark.streaming.dstream.PairDStreamFunctions$$anonfun$8.apply(PairDStreamFunctions.scala:630)
        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42)
        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
        at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:171)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.RuntimeException: class scala.runtime.Nothing$ not
org.apache.hadoop.mapred.OutputFormat
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2073)
        ... 14 more

I have searched this online and it seems as though a lot of ppl have this
problem , but there doesn't seem to be an answer. Thanks for the help and
hopefully this should solve all my problems. Thanks!

Reply via email to