If you look at your program output closely, you can see the following output.
Lines with a: 24, Lines with b: 15
The exception seems to be happening with Spark cleanup after executing your
code. Try adding sc.stop() at the end of your program to see if the exception
goes away.
On Wednesday, December 31, 2014 6:40 AM, Naveen Madhire
<[email protected]> wrote:
Hi All,
I am trying to run a sample Spark program using Scala SBT,
Below is the program,
def main(args: Array[String]) {
val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should be
some file on your system val sc = new SparkContext("local", "Simple App",
"E:/ApacheSpark/usb/usb/spark/bin",List("target/scala-2.10/sbt2_2.10-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count() val
numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
Below is the error log,
14/12/30 23:20:21 INFO rdd.HadoopRDD: Input split:
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:0+67314/12/30 23:20:21 INFO
storage.MemoryStore: ensureFreeSpace(2032) called with curMem=34047,
maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block rdd_1_0
stored as values in memory (estimated size 2032.0 B, free 267.2 MB)14/12/30
23:20:21 INFO storage.BlockManagerInfo: Added rdd_1_0 in memory on zealot:61452
(size: 2032.0 B, free: 267.3 MB)14/12/30 23:20:21 INFO
storage.BlockManagerMaster: Updated info of block rdd_1_014/12/30 23:20:21 INFO
executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 2300 bytes result
sent to driver14/12/30 23:20:21 INFO scheduler.TaskSetManager: Starting task
1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1264 bytes)14/12/30 23:20:21
INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)14/12/30 23:20:21
INFO spark.CacheManager: Partition rdd_1_1 not found, computing it14/12/30
23:20:21 INFO rdd.HadoopRDD: Input split:
file:/E:/ApacheSpark/usb/usb/spark/bin/README.md:673+67314/12/30 23:20:21 INFO
scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3507 ms on
localhost (1/2)14/12/30 23:20:21 INFO storage.MemoryStore:
ensureFreeSpace(1912) called with curMem=36079, maxMem=28024897514/12/30
23:20:21 INFO storage.MemoryStore: Block rdd_1_1 stored as values in memory
(estimated size 1912.0 B, free 267.2 MB)14/12/30 23:20:21 INFO
storage.BlockManagerInfo: Added rdd_1_1 in memory on zealot:61452 (size: 1912.0
B, free: 267.3 MB)14/12/30 23:20:21 INFO storage.BlockManagerMaster: Updated
info of block rdd_1_114/12/30 23:20:21 INFO executor.Executor: Finished task
1.0 in stage 0.0 (TID 1). 2300 bytes result sent to driver14/12/30 23:20:21
INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 261 ms
on localhost (2/2)14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed
TaskSet 0.0, whose tasks have all completed, from pool 14/12/30 23:20:21 INFO
scheduler.DAGScheduler: Stage 0 (count at Test1.scala:19) finished in 3.811
s14/12/30 23:20:21 INFO spark.SparkContext: Job finished: count at
Test1.scala:19, took 3.997365232 s14/12/30 23:20:21 INFO spark.SparkContext:
Starting job: count at Test1.scala:2014/12/30 23:20:21 INFO
scheduler.DAGScheduler: Got job 1 (count at Test1.scala:20) with 2 output
partitions (allowLocal=false)14/12/30 23:20:21 INFO scheduler.DAGScheduler:
Final stage: Stage 1(count at Test1.scala:20)14/12/30 23:20:21 INFO
scheduler.DAGScheduler: Parents of final stage: List()14/12/30 23:20:21 INFO
scheduler.DAGScheduler: Missing parents: List()14/12/30 23:20:21 INFO
scheduler.DAGScheduler: Submitting Stage 1 (FilteredRDD[3] at filter at
Test1.scala:20), which has no missing parents14/12/30 23:20:21 INFO
storage.MemoryStore: ensureFreeSpace(2600) called with curMem=37991,
maxMem=28024897514/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2
stored as values in memory (estimated size 2.5 KB, free 267.2 MB)14/12/30
23:20:21 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1
(FilteredRDD[3] at filter at Test1.scala:20)14/12/30 23:20:21 INFO
scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks14/12/30 23:20:21
INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2,
localhost, ANY, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running
task 0.0 in stage 1.0 (TID 2)14/12/30 23:20:21 INFO storage.BlockManager: Found
block rdd_1_0 locally14/12/30 23:20:21 INFO executor.Executor: Finished task
0.0 in stage 1.0 (TID 2). 1731 bytes result sent to driver14/12/30 23:20:21
INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3,
localhost, ANY, 1264 bytes)14/12/30 23:20:21 INFO executor.Executor: Running
task 1.0 in stage 1.0 (TID 3)14/12/30 23:20:21 INFO storage.BlockManager: Found
block rdd_1_1 locally14/12/30 23:20:21 INFO executor.Executor: Finished task
1.0 in stage 1.0 (TID 3). 1731 bytes result sent to driver14/12/30 23:20:21
INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 7 ms
on localhost (1/2)14/12/30 23:20:21 INFO scheduler.TaskSetManager: Finished
task 0.0 in stage 1.0 (TID 2) in 16 ms on localhost (2/2)14/12/30 23:20:21 INFO
scheduler.DAGScheduler: Stage 1 (count at Test1.scala:20) finished in 0.016
s14/12/30 23:20:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose
tasks have all completed, from pool 14/12/30 23:20:21 INFO spark.SparkContext:
Job finished: count at Test1.scala:20, took 0.041709824 sLines with a: 24,
Lines with b: 1514/12/30 23:20:21 ERROR util.Utils: Uncaught exception in
thread SparkListenerBusjava.lang.InterruptedException at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1301)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)[success]
Total time: 12 s, completed Dec 30, 2014 11:20:21 PM at
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)14/12/30
23:20:21 ERROR spark.ContextCleaner: Error in cleaning
threadjava.lang.InterruptedException at java.lang.Object.wait(Native Method) at
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
at
org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)14/12/30
23:20:21 INFO network.ConnectionManager: Selector thread was
interrupted!14/12/30 23:20:21 INFO storage.BlockManager: Removing broadcast
214/12/30 23:20:21 INFO storage.BlockManager: Removing block
broadcast_214/12/30 23:20:21 INFO storage.MemoryStore: Block broadcast_2 of
size 2600 dropped from memory (free 280210984)14/12/30 23:20:21 INFO
spark.ContextCleaner: Cleaned broadcast 2
Please let me know any pointers to debug the error.
Thanks a lot.