Hi,
I’ve got a problem with Spark Streaming and tshark.
While I’m running locally I have no problems with this code, but when I run it 
on a EC2 cluster I get the exception shown just under the code.

def dissection(s: String): Seq[String] = {
    try {

      Process("hadoop command to create ./localcopy.tmp").! // calls hadoop to 
copy a file from s3 locally
      val pb = Process(“tshark … localcopy.tmp”)  // calls tshark to transform 
the s3 file into sequence of strings
      var returnValue = pb.lines_!.toSeq
      return returnValue

    } catch {
      case e: Exception =>
        System.err.println(“ERROR")
        return new MutableList[String]()
    }
  }

(line 2051 points to the function “dissection”)

WARN scheduler.TaskSetManager: Loss was due to 
java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at Main$$anonfun$11.apply(Main.scala:2051)
at Main$$anonfun$11.apply(Main.scala:2051)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Has anyone got an idea why that may happen? I’m pretty sure that the hadoop 
call works perfectly.

Thanks
Gianluca

Reply via email to