I have pasted the logs below:
PS F:\spark-0.9.1\codes\sentiment analysis> pyspark .\naive_bayes_analyser.py Running python with PYTHONPATH=F:\spark-0.9.1\spark-0.9.1\bin\..\python; SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/F:/spark-0.9.1/spark-0.9.1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop1.0. 4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/F:/spark-0.9.1/spark-0.9.1/tools/target/scala-2.10/spark-tools-assembly-0.9.1.jar!/or g/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 14/07/09 00:57:25 INFO SparkEnv: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/07/09 00:57:25 INFO SparkEnv: Registering BlockManagerMaster 14/07/09 00:57:25 INFO DiskBlockManager: Created local directory at C:\Users\shawn\AppData\Local\Temp\spark-local-201407 09005725-fe99 14/07/09 00:57:25 INFO MemoryStore: MemoryStore started with capacity 297.0 MB. 14/07/09 00:57:25 INFO ConnectionManager: Bound socket to port 51231 with id = ConnectionManagerId(shawn-PC,51231) 14/07/09 00:57:25 INFO BlockManagerMaster: Trying to register BlockManager 14/07/09 00:57:25 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager shawn-PC:51231 with 297.0 MB RAM 14/07/09 00:57:25 INFO BlockManagerMaster: Registered BlockManager 14/07/09 00:57:26 INFO HttpServer: Starting HTTP Server 14/07/09 00:57:26 INFO HttpBroadcast: Broadcast server started at http://192.168.1.100:51232 14/07/09 00:57:26 INFO SparkEnv: Registering MapOutputTracker 14/07/09 00:57:26 INFO HttpFileServer: HTTP File server directory is C:\Users\shawn\AppData\Local\Temp\spark-339491dd-68 f4-4027-b661-00f2c5f95494 14/07/09 00:57:26 INFO HttpServer: Starting HTTP Server 14/07/09 00:57:26 INFO SparkUI: Started Spark Web UI at http://shawn-PC:4040 14/07/09 00:57:39 INFO SparkContext: Starting job: aggregate at NaiveBayes.scala:81 14/07/09 00:57:39 INFO DAGScheduler: Got job 0 (aggregate at NaiveBayes.scala:81) with 6 output partitions (allowLocal=f alse) 14/07/09 00:57:39 INFO DAGScheduler: Final stage: Stage 0 (aggregate at NaiveBayes.scala:81) 14/07/09 00:57:39 INFO DAGScheduler: Parents of final stage: List() 14/07/09 00:57:39 INFO DAGScheduler: Missing parents: List() 14/07/09 00:57:39 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at map at PythonMLLibAPI.scala:190), which has no missing parents 14/07/09 00:57:39 INFO DAGScheduler: Submitting 6 missing tasks from Stage 0 (MappedRDD[2] at map at PythonMLLibAPI.scal a:190) 14/07/09 00:57:39 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks 14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:0 as 52792 bytes in 4 ms 14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:1 as 52792 bytes in 0 ms 14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:2 as 52792 bytes in 1 ms 14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:3 as 52792 bytes in 0 ms 14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:4 as 52792 bytes in 0 ms 14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:5 as 53011 bytes in 0 ms 14/07/09 00:57:39 INFO Executor: Running task ID 3 14/07/09 00:57:39 INFO Executor: Running task ID 1 14/07/09 00:57:39 INFO Executor: Running task ID 2 14/07/09 00:57:39 INFO Executor: Running task ID 5 14/07/09 00:57:39 INFO Executor: Running task ID 4 14/07/09 00:57:39 INFO Executor: Running task ID 0 14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_4 not found, computing it 14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_0 not found, computing it 14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_5 not found, computing it 14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_2 not found, computing it 14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_1 not found, computing it 14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_3 not found, computing it 14/07/09 00:57:39 INFO PythonRDD: Times: total = 290, boot = 176, init = 102, finish = 12 14/07/09 00:57:39 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes 14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=0, maxMem=311387750 14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_4 stored as values to memory (estimated size 56.1 KB, free 296.9 MB) 14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_4 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.9 MB) 14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_4 14/07/09 00:57:39 INFO Executor: Serialized size of result for 4 is 967 14/07/09 00:57:39 INFO Executor: Sending result for 4 directly to driver 14/07/09 00:57:39 INFO Executor: Finished task ID 4 14/07/09 00:57:39 INFO TaskSetManager: Finished TID 4 in 438 ms on localhost (progress: 1/6) 14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 4) 14/07/09 00:57:39 INFO PythonRDD: Times: total = 457, boot = 334, init = 111, finish = 12 14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=57465, maxMem=311387750 14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_3 stored as values to memory (estimated size 56.1 KB, free 296.9 MB) 14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_3 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.9 MB) 14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_3 14/07/09 00:57:39 INFO Executor: Serialized size of result for 3 is 967 14/07/09 00:57:39 INFO Executor: Sending result for 3 directly to driver 14/07/09 00:57:39 INFO Executor: Finished task ID 3 14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 3) 14/07/09 00:57:39 INFO TaskSetManager: Finished TID 3 in 522 ms on localhost (progress: 2/6) 14/07/09 00:57:39 INFO PythonRDD: Times: total = 622, boot = 513, init = 98, finish = 11 14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=114930, maxMem=311387750 14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_1 stored as values to memory (estimated size 56.1 KB, free 296.8 MB) 14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_1 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.8 MB) 14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_1 14/07/09 00:57:39 INFO Executor: Serialized size of result for 1 is 967 14/07/09 00:57:39 INFO Executor: Sending result for 1 directly to driver 14/07/09 00:57:39 INFO Executor: Finished task ID 1 14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 1) 14/07/09 00:57:39 INFO TaskSetManager: Finished TID 1 in 677 ms on localhost (progress: 3/6) 14/07/09 00:57:39 INFO PythonRDD: Times: total = 787, boot = 678, init = 98, finish = 11 14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=172395, maxMem=311387750 14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_2 stored as values to memory (estimated size 56.1 KB, free 296.7 MB) 14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_2 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.7 MB) 14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_2 14/07/09 00:57:39 INFO Executor: Serialized size of result for 2 is 967 14/07/09 00:57:39 INFO Executor: Sending result for 2 directly to driver 14/07/09 00:57:39 INFO Executor: Finished task ID 2 14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 2) 14/07/09 00:57:39 INFO TaskSetManager: Finished TID 2 in 838 ms on localhost (progress: 4/6) 14/07/09 00:57:40 INFO PythonRDD: Times: total = 950, boot = 842, init = 96, finish = 12 14/07/09 00:57:40 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=229860, maxMem=311387750 14/07/09 00:57:40 INFO MemoryStore: Block rdd_1_5 stored as values to memory (estimated size 56.1 KB, free 296.7 MB) 14/07/09 00:57:40 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_5 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.7 MB) 14/07/09 00:57:40 INFO BlockManagerMaster: Updated info of block rdd_1_5 14/07/09 00:57:40 INFO Executor: Serialized size of result for 5 is 967 14/07/09 00:57:40 INFO Executor: Sending result for 5 directly to driver 14/07/09 00:57:40 INFO Executor: Finished task ID 5 14/07/09 00:57:40 INFO TaskSetManager: Finished TID 5 in 995 ms on localhost (progress: 5/6) 14/07/09 00:57:40 INFO DAGScheduler: Completed ResultTask(0, 5) 14/07/09 00:57:40 INFO PythonRDD: Times: total = 1114, boot = 1004, init = 98, finish = 12 14/07/09 00:57:40 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=287325, maxMem=311387750 14/07/09 00:57:40 INFO MemoryStore: Block rdd_1_0 stored as values to memory (estimated size 56.1 KB, free 296.6 MB) 14/07/09 00:57:40 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_0 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.6 MB) 14/07/09 00:57:40 INFO BlockManagerMaster: Updated info of block rdd_1_0 14/07/09 00:57:40 INFO Executor: Serialized size of result for 0 is 967 14/07/09 00:57:40 INFO Executor: Sending result for 0 directly to driver 14/07/09 00:57:40 INFO Executor: Finished task ID 0 14/07/09 00:57:40 INFO DAGScheduler: Completed ResultTask(0, 0) 14/07/09 00:57:40 INFO TaskSetManager: Finished TID 0 in 1173 ms on localhost (progress: 6/6) 14/07/09 00:57:40 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/07/09 00:57:40 INFO DAGScheduler: Stage 0 (aggregate at NaiveBayes.scala:81) finished in 1.180 s 14/07/09 00:57:40 INFO SparkContext: Job finished: aggregate at NaiveBayes.scala:81, took 1.24974564 s [0.0, 329.0, 231.0] 14/07/09 00:57:41 INFO SparkContext: Starting job: aggregate at NaiveBayes.scala:81 14/07/09 00:57:41 INFO DAGScheduler: Got job 1 (aggregate at NaiveBayes.scala:81) with 6 output partitions (allowLocal=f alse) 14/07/09 00:57:41 INFO DAGScheduler: Final stage: Stage 1 (aggregate at NaiveBayes.scala:81) 14/07/09 00:57:41 INFO DAGScheduler: Parents of final stage: List() 14/07/09 00:57:41 INFO DAGScheduler: Missing parents: List() 14/07/09 00:57:41 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[5] at map at PythonMLLibAPI.scala:190), which has no missing parents 14/07/09 00:57:41 INFO DAGScheduler: Submitting 6 missing tasks from Stage 1 (MappedRDD[5] at map at PythonMLLibAPI.scal a:190) 14/07/09 00:57:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks 14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:0 as TID 6 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:0 as 52790 bytes in 1 ms 14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:1 as TID 7 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:1 as 52790 bytes in 1 ms 14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:2 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:2 as 52790 bytes in 0 ms 14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:3 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:3 as 52790 bytes in 0 ms 14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:4 as TID 10 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:4 as 52790 bytes in 1 ms 14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:5 as TID 11 on executor localhost: localhost (PROCESS_LOCAL) 14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:5 as 53060 bytes in 1 ms 14/07/09 00:57:41 INFO Executor: Running task ID 10 14/07/09 00:57:41 INFO Executor: Running task ID 6 14/07/09 00:57:41 INFO Executor: Running task ID 9 14/07/09 00:57:41 INFO Executor: Running task ID 8 14/07/09 00:57:41 INFO Executor: Running task ID 11 14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_2 not found, computing it 14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_4 not found, computing it 14/07/09 00:57:41 INFO Executor: Running task ID 7 14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_5 not found, computing it 14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_0 not found, computing it 14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_3 not found, computing it 14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_1 not found, computing it 14/07/09 00:57:42 INFO PythonRDD: Times: total = 268, boot = 157, init = 100, finish = 11 14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=344790, maxMem=311387750 14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_2 stored as values to memory (estimated size 56.1 KB, free 296.6 MB) 14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_4_2 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.6 MB) 14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_2 14/07/09 00:57:42 INFO Executor: Serialized size of result for 8 is 967 14/07/09 00:57:42 INFO Executor: Sending result for 8 directly to driver 14/07/09 00:57:42 INFO Executor: Finished task ID 8 14/07/09 00:57:42 INFO TaskSetManager: Finished TID 8 in 289 ms on localhost (progress: 1/6) 14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 2) 14/07/09 00:57:42 INFO PythonRDD: Times: total = 425, boot = 316, init = 97, finish = 12 14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=402255, maxMem=311387750 14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_1 stored as values to memory (estimated size 56.1 KB, free 296.5 MB) 14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_4_1 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.5 MB) 14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_1 14/07/09 00:57:42 INFO Executor: Serialized size of result for 7 is 967 14/07/09 00:57:42 INFO Executor: Sending result for 7 directly to driver 14/07/09 00:57:42 INFO Executor: Finished task ID 7 14/07/09 00:57:42 INFO TaskSetManager: Finished TID 7 in 452 ms on localhost (progress: 2/6) 14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 1) 14/07/09 00:57:42 INFO PythonRDD: Times: total = 589, boot = 479, init = 98, finish = 12 14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=459720, maxMem=311387750 14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_3 stored as values to memory (estimated size 56.1 KB, free 296.5 MB) 14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_4_3 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.5 MB) 14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_3 14/07/09 00:57:42 INFO Executor: Serialized size of result for 9 is 967 14/07/09 00:57:42 INFO Executor: Sending result for 9 directly to driver 14/07/09 00:57:42 INFO Executor: Finished task ID 9 14/07/09 00:57:42 INFO TaskSetManager: Finished TID 9 in 612 ms on localhost (progress: 3/6) 14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 3) 14/07/09 00:57:42 INFO PythonRDD: Times: total = 752, boot = 642, init = 98, finish = 12 14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=517185, maxMem=311387750 14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_0 stored as values to memory (estimated size 56.1 KB, free 296.4 MB) 14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_4_0 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.4 MB) 14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_0 14/07/09 00:57:42 INFO Executor: Serialized size of result for 6 is 967 14/07/09 00:57:42 INFO Executor: Sending result for 6 directly to driver 14/07/09 00:57:42 INFO Executor: Finished task ID 6 14/07/09 00:57:42 INFO TaskSetManager: Finished TID 6 in 777 ms on localhost (progress: 4/6) 14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 0) 14/07/09 00:57:42 INFO PythonRDD: Times: total = 919, boot = 806, init = 101, finish = 12 14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=574650, maxMem=311387750 14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_5 stored as values to memory (estimated size 56.1 KB, free 296.4 MB) 14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_4_5 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.4 MB) 14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_5 14/07/09 00:57:42 INFO Executor: Serialized size of result for 11 is 967 14/07/09 00:57:42 INFO Executor: Sending result for 11 directly to driver 14/07/09 00:57:42 INFO Executor: Finished task ID 11 14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 5) 14/07/09 00:57:42 INFO TaskSetManager: Finished TID 11 in 938 ms on localhost (progress: 5/6) 14/07/09 00:57:42 INFO PythonRDD: Times: total = 1079, boot = 972, init = 95, finish = 12 14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with curMem=632115, maxMem=311387750 14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_4 stored as values to memory (estimated size 56.1 KB, free 296.3 MB) 14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_4_4 in memory on shawn-PC:51231 (size: 56.1 K B, free: 296.3 MB) 14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_4 14/07/09 00:57:42 INFO Executor: Serialized size of result for 10 is 967 14/07/09 00:57:42 INFO Executor: Sending result for 10 directly to driver 14/07/09 00:57:42 INFO Executor: Finished task ID 10 14/07/09 00:57:42 INFO TaskSetManager: Finished TID 10 in 1098 ms on localhost (progress: 6/6) 14/07/09 00:57:42 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 4) 14/07/09 00:57:42 INFO DAGScheduler: Stage 1 (aggregate at NaiveBayes.scala:81) finished in 1.106 s 14/07/09 00:57:42 INFO SparkContext: Job finished: aggregate at NaiveBayes.scala:81, took 1.114280889 s Exception in thread "delete Spark temp dir C:\Users\shawn\AppData\Local\Temp\spark-45a2e1f4-8229-4614-a428-593886dac0c6" java.io.IOException: Failed to delete: C:\Users\shawn\AppData\Local\Temp\spark-45a2e1f4-8229-4614-a428-593886dac0c6\tmp d7r1h0 at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:483) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) at org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:212) ################################ These are the logs. Can you suggest something after looking at it. On Wed, Jul 9, 2014 at 1:10 AM, Rahul Bhojwani <rahulbhojwani2...@gmail.com> wrote: > Here I am adding my code. If you can have a look to help me out. > Thanks > ####################### > > import tokenizer > import gettingWordLists as gl > from pyspark.mllib.classification import NaiveBayes > from numpy import array > from pyspark import SparkContext, SparkConf > > conf = (SparkConf().setMaster("local[6]").setAppName("My > app").set("spark.executor.memory", "1g")) > > sc=SparkContext(conf = conf) > # Getting the positive dict: > > pos_list = [] > pos_list = gl.getPositiveList() > neg_list = gl.getNegativeList() > > #print neg_list > tok = tokenizer.Tokenizer(preserve_case=False) > train_data = [] > > with open("training_file_coach.csv","r") as train_file: > for line in train_file: > tokens = line.split("######") > msg = tokens[0] > sentiment = tokens[1] > pos_count = 0 > neg_count = 0 > # print sentiment + "\n\n" > # print msg > tokens = set(tok.tokenize(msg)) > for i in tokens: > if i.encode('utf-8') in pos_list: > pos_count+=1 > if i.encode('utf-8') in neg_list: > neg_count+=1 > if sentiment.__contains__('NEG'): > label = 0.0 > else: > label = 1.0 > > feature = [] > feature.append(label) > feature.append(float(pos_count)) > feature.append(float(neg_count)) > train_data.append(feature) > train_file.close() > > model = NaiveBayes.train(sc.parallelize(array(train_data))) > > > file_predicted = open("predicted_file_coach.csv","w") > > with open("prediction_file_coach.csv","r") as predict_file: > for line in predict_file: > msg = line[0:-1] > pos_count = 0 > neg_count = 0 > # print sentiment + "\n\n" > # print msg > tokens = set(tok.tokenize(msg)) > for i in tokens: > if i.encode('utf-8') in pos_list: > pos_count+=1 > if i.encode('utf-8') in neg_list: > neg_count+=1 > prediction = > model.predict(array([float(pos_count),float(neg_count)])) > if prediction == 0: > sentiment = "NEG" > elif prediction == 1: > sentiment = "POS" > else: > print "ERROR\n\n\n\n\n\n\nERROR" > > feature = [] > feature.append(float(prediction)) > feature.append(float(pos_count)) > feature.append(float(neg_count)) > print feature > train_data.append(feature) > model = NaiveBayes.train(sc.parallelize(array(train_data))) > file_predicted.write(msg + "######" + sentiment + "\n") > > file_predicted.close() > ################### > > If you can have a look at the code and help me out, It would be great > > Thanks > > > On Wed, Jul 9, 2014 at 12:54 AM, Rahul Bhojwani < > rahulbhojwani2...@gmail.com> wrote: > >> Hi Marcelo. >> Thanks for the quick reply. Can you suggest me how to increase the memory >> limits or how to tackle this problem. I am a novice. If you want I can post >> my code here. >> >> >> Thanks >> >> >> On Wed, Jul 9, 2014 at 12:50 AM, Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >>> This is generally a side effect of your executor being killed. For >>> example, Yarn will do that if you're going over the requested memory >>> limits. >>> >>> On Tue, Jul 8, 2014 at 12:17 PM, Rahul Bhojwani >>> <rahulbhojwani2...@gmail.com> wrote: >>> > HI, >>> > >>> > I am getting this error. Can anyone help out to explain why is this >>> error >>> > coming. >>> > >>> > ######## >>> > >>> > Exception in thread "delete Spark temp dir >>> > >>> C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560" >>> > java.io.IOException: Failed to delete: >>> > >>> C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560\tmp >>> > cmenlp >>> > at >>> org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:483) >>> > at >>> > >>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) >>> > at >>> > >>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) >>> > at >>> > >>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>> > at >>> > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) >>> > at >>> org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) >>> > at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:212) >>> > PS> >>> > ############ >>> > >>> > >>> > >>> > >>> > Thanks in advance >>> > -- >>> > Rahul K Bhojwani >>> > 3rd Year B.Tech >>> > Computer Science and Engineering >>> > National Institute of Technology, Karnataka >>> >>> >>> >>> -- >>> Marcelo >>> >> >> >> >> -- >> Rahul K Bhojwani >> 3rd Year B.Tech >> Computer Science and Engineering >> National Institute of Technology, Karnataka >> > > > > -- > Rahul K Bhojwani > 3rd Year B.Tech > Computer Science and Engineering > National Institute of Technology, Karnataka > -- Rahul K Bhojwani 3rd Year B.Tech Computer Science and Engineering National Institute of Technology, Karnataka