I shut down my first (working) cluster and brought up a fresh one... and
It's been a bit of a horror and I need to sleep now. Should I be worried
about these errors? Or did I just have the old log4j.config tuned so I
didn't see them?

I

14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error running job streaming
job 1402245172000 ms.2
scala.MatchError: 0101-01-10 (of class java.lang.String)
        at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:218)
        at SimpleApp$$anonfun$6$$anonfun$apply$6.apply(SimpleApp.scala:217)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at SimpleApp$$anonfun$6.apply(SimpleApp.scala:217)
        at SimpleApp$$anonfun$6.apply(SimpleApp.scala:214)
        at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
        at
org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
        at scala.util.Try$.apply(Try.scala:161)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
        at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)


The error comes from this code, which seemed like a sensible way to match
things:
(The "case cmd_plus(w)" statement is generating the error,)

val cmd_plus = """[+]([\w]+)""".r
val cmd_minus = """[-]([\w]+)""".r
// find command user tweets
val commands = stream.map(
status => ( status.getUser().getId(), status.getText() )
).foreachRDD(rdd => {
rdd.join(superusers).map(
x => x._2._1
).collect().foreach{ cmd => {
218:  cmd match {
case cmd_plus(w) => {
...
} case cmd_minus(w) => { ... } } }} })

It seems a bit excessive for scala to throw exceptions because a regex
didn't match. Something feels wrong.

Reply via email to