I shut down my first (working) cluster and brought up a fresh one... and It's been a bit of a horror and I need to sleep now. Should I be worried about these errors? Or did I just have the old log4j.config tuned so I didn't see them?
I 14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error running job streaming job 1402245172000 ms.2 scala.MatchError: 0101-01-10 (of class java.lang.String) at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:218) at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:217) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at SimpleApp$$anonfun$6.apply(SimpleApp.scala:217) at SimpleApp$$anonfun$6.apply(SimpleApp.scala:214) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) at scala.util.Try$.apply(Try.scala:161) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) The error comes from this code, which seemed like a sensible way to match things: (The "case cmd_plus(w)" statement is generating the error,) val cmd_plus = """[+]([\w]+)""".r val cmd_minus = """[-]([\w]+)""".r // find command user tweets val commands = stream.map( status => ( status.getUser().getId(), status.getText() ) ).foreachRDD(rdd => { rdd.join(superusers).map( x => x._2._1 ).collect().foreach{ cmd => { 218: cmd match { case cmd_plus(w) => { ... } case cmd_minus(w) => { ... } } }} }) It seems a bit excessive for scala to throw exceptions because a regex didn't match. Something feels wrong.