I have pasted the logs below:

PS F:\spark-0.9.1\codes\sentiment analysis> pyspark
.\naive_bayes_analyser.py
Running python with PYTHONPATH=F:\spark-0.9.1\spark-0.9.1\bin\..\python;
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/F:/spark-0.9.1/spark-0.9.1/assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop1.0.
4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/F:/spark-0.9.1/spark-0.9.1/tools/target/scala-2.10/spark-tools-assembly-0.9.1.jar!/or
g/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
14/07/09 00:57:25 INFO SparkEnv: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/07/09 00:57:25 INFO SparkEnv: Registering BlockManagerMaster
14/07/09 00:57:25 INFO DiskBlockManager: Created local directory at
C:\Users\shawn\AppData\Local\Temp\spark-local-201407
09005725-fe99
14/07/09 00:57:25 INFO MemoryStore: MemoryStore started with capacity 297.0
MB.
14/07/09 00:57:25 INFO ConnectionManager: Bound socket to port 51231 with
id = ConnectionManagerId(shawn-PC,51231)
14/07/09 00:57:25 INFO BlockManagerMaster: Trying to register BlockManager
14/07/09 00:57:25 INFO BlockManagerMasterActor$BlockManagerInfo:
Registering block manager shawn-PC:51231 with 297.0 MB
RAM
14/07/09 00:57:25 INFO BlockManagerMaster: Registered BlockManager
14/07/09 00:57:26 INFO HttpServer: Starting HTTP Server
14/07/09 00:57:26 INFO HttpBroadcast: Broadcast server started at
http://192.168.1.100:51232
14/07/09 00:57:26 INFO SparkEnv: Registering MapOutputTracker
14/07/09 00:57:26 INFO HttpFileServer: HTTP File server directory is
C:\Users\shawn\AppData\Local\Temp\spark-339491dd-68
f4-4027-b661-00f2c5f95494
14/07/09 00:57:26 INFO HttpServer: Starting HTTP Server
14/07/09 00:57:26 INFO SparkUI: Started Spark Web UI at http://shawn-PC:4040
14/07/09 00:57:39 INFO SparkContext: Starting job: aggregate at
NaiveBayes.scala:81
14/07/09 00:57:39 INFO DAGScheduler: Got job 0 (aggregate at
NaiveBayes.scala:81) with 6 output partitions (allowLocal=f
alse)
14/07/09 00:57:39 INFO DAGScheduler: Final stage: Stage 0 (aggregate at
NaiveBayes.scala:81)
14/07/09 00:57:39 INFO DAGScheduler: Parents of final stage: List()
14/07/09 00:57:39 INFO DAGScheduler: Missing parents: List()
14/07/09 00:57:39 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at
map at PythonMLLibAPI.scala:190), which has no
missing parents
14/07/09 00:57:39 INFO DAGScheduler: Submitting 6 missing tasks from Stage
0 (MappedRDD[2] at map at PythonMLLibAPI.scal
a:190)
14/07/09 00:57:39 INFO TaskSchedulerImpl: Adding task set 0.0 with 6 tasks
14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:0 as 52792 bytes
in 4 ms
14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:1 as 52792 bytes
in 0 ms
14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:2 as 52792 bytes
in 1 ms
14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:3 as 52792 bytes
in 0 ms
14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:4 as 52792 bytes
in 0 ms
14/07/09 00:57:39 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:39 INFO TaskSetManager: Serialized task 0.0:5 as 53011 bytes
in 0 ms
14/07/09 00:57:39 INFO Executor: Running task ID 3
14/07/09 00:57:39 INFO Executor: Running task ID 1
14/07/09 00:57:39 INFO Executor: Running task ID 2
14/07/09 00:57:39 INFO Executor: Running task ID 5
14/07/09 00:57:39 INFO Executor: Running task ID 4
14/07/09 00:57:39 INFO Executor: Running task ID 0
14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_4 not found, computing
it
14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_0 not found, computing
it
14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_5 not found, computing
it
14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_2 not found, computing
it
14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_1 not found, computing
it
14/07/09 00:57:39 INFO CacheManager: Partition rdd_1_3 not found, computing
it
14/07/09 00:57:39 INFO PythonRDD: Times: total = 290, boot = 176, init =
102, finish = 12
14/07/09 00:57:39 WARN SizeEstimator: Failed to check whether
UseCompressedOops is set; assuming yes
14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=0, maxMem=311387750
14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_4 stored as values to
memory (estimated size 56.1 KB, free 296.9 MB)
14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_1_4 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.9 MB)
14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_4
14/07/09 00:57:39 INFO Executor: Serialized size of result for 4 is 967
14/07/09 00:57:39 INFO Executor: Sending result for 4 directly to driver
14/07/09 00:57:39 INFO Executor: Finished task ID 4
14/07/09 00:57:39 INFO TaskSetManager: Finished TID 4 in 438 ms on
localhost (progress: 1/6)
14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 4)
14/07/09 00:57:39 INFO PythonRDD: Times: total = 457, boot = 334, init =
111, finish = 12
14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=57465, maxMem=311387750
14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_3 stored as values to
memory (estimated size 56.1 KB, free 296.9 MB)
14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_1_3 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.9 MB)
14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_3
14/07/09 00:57:39 INFO Executor: Serialized size of result for 3 is 967
14/07/09 00:57:39 INFO Executor: Sending result for 3 directly to driver
14/07/09 00:57:39 INFO Executor: Finished task ID 3
14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 3)
14/07/09 00:57:39 INFO TaskSetManager: Finished TID 3 in 522 ms on
localhost (progress: 2/6)
14/07/09 00:57:39 INFO PythonRDD: Times: total = 622, boot = 513, init =
98, finish = 11
14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=114930, maxMem=311387750
14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_1 stored as values to
memory (estimated size 56.1 KB, free 296.8 MB)
14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_1_1 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.8 MB)
14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_1
14/07/09 00:57:39 INFO Executor: Serialized size of result for 1 is 967
14/07/09 00:57:39 INFO Executor: Sending result for 1 directly to driver
14/07/09 00:57:39 INFO Executor: Finished task ID 1
14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 1)
14/07/09 00:57:39 INFO TaskSetManager: Finished TID 1 in 677 ms on
localhost (progress: 3/6)
14/07/09 00:57:39 INFO PythonRDD: Times: total = 787, boot = 678, init =
98, finish = 11
14/07/09 00:57:39 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=172395, maxMem=311387750
14/07/09 00:57:39 INFO MemoryStore: Block rdd_1_2 stored as values to
memory (estimated size 56.1 KB, free 296.7 MB)
14/07/09 00:57:39 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_1_2 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.7 MB)
14/07/09 00:57:39 INFO BlockManagerMaster: Updated info of block rdd_1_2
14/07/09 00:57:39 INFO Executor: Serialized size of result for 2 is 967
14/07/09 00:57:39 INFO Executor: Sending result for 2 directly to driver
14/07/09 00:57:39 INFO Executor: Finished task ID 2
14/07/09 00:57:39 INFO DAGScheduler: Completed ResultTask(0, 2)
14/07/09 00:57:39 INFO TaskSetManager: Finished TID 2 in 838 ms on
localhost (progress: 4/6)
14/07/09 00:57:40 INFO PythonRDD: Times: total = 950, boot = 842, init =
96, finish = 12
14/07/09 00:57:40 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=229860, maxMem=311387750
14/07/09 00:57:40 INFO MemoryStore: Block rdd_1_5 stored as values to
memory (estimated size 56.1 KB, free 296.7 MB)
14/07/09 00:57:40 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_1_5 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.7 MB)
14/07/09 00:57:40 INFO BlockManagerMaster: Updated info of block rdd_1_5
14/07/09 00:57:40 INFO Executor: Serialized size of result for 5 is 967
14/07/09 00:57:40 INFO Executor: Sending result for 5 directly to driver
14/07/09 00:57:40 INFO Executor: Finished task ID 5
14/07/09 00:57:40 INFO TaskSetManager: Finished TID 5 in 995 ms on
localhost (progress: 5/6)
14/07/09 00:57:40 INFO DAGScheduler: Completed ResultTask(0, 5)
14/07/09 00:57:40 INFO PythonRDD: Times: total = 1114, boot = 1004, init =
98, finish = 12
14/07/09 00:57:40 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=287325, maxMem=311387750
14/07/09 00:57:40 INFO MemoryStore: Block rdd_1_0 stored as values to
memory (estimated size 56.1 KB, free 296.6 MB)
14/07/09 00:57:40 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_1_0 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.6 MB)
14/07/09 00:57:40 INFO BlockManagerMaster: Updated info of block rdd_1_0
14/07/09 00:57:40 INFO Executor: Serialized size of result for 0 is 967
14/07/09 00:57:40 INFO Executor: Sending result for 0 directly to driver
14/07/09 00:57:40 INFO Executor: Finished task ID 0
14/07/09 00:57:40 INFO DAGScheduler: Completed ResultTask(0, 0)
14/07/09 00:57:40 INFO TaskSetManager: Finished TID 0 in 1173 ms on
localhost (progress: 6/6)
14/07/09 00:57:40 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
14/07/09 00:57:40 INFO DAGScheduler: Stage 0 (aggregate at
NaiveBayes.scala:81) finished in 1.180 s
14/07/09 00:57:40 INFO SparkContext: Job finished: aggregate at
NaiveBayes.scala:81, took 1.24974564 s
[0.0, 329.0, 231.0]
14/07/09 00:57:41 INFO SparkContext: Starting job: aggregate at
NaiveBayes.scala:81
14/07/09 00:57:41 INFO DAGScheduler: Got job 1 (aggregate at
NaiveBayes.scala:81) with 6 output partitions (allowLocal=f
alse)
14/07/09 00:57:41 INFO DAGScheduler: Final stage: Stage 1 (aggregate at
NaiveBayes.scala:81)
14/07/09 00:57:41 INFO DAGScheduler: Parents of final stage: List()
14/07/09 00:57:41 INFO DAGScheduler: Missing parents: List()
14/07/09 00:57:41 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[5] at
map at PythonMLLibAPI.scala:190), which has no
missing parents
14/07/09 00:57:41 INFO DAGScheduler: Submitting 6 missing tasks from Stage
1 (MappedRDD[5] at map at PythonMLLibAPI.scal
a:190)
14/07/09 00:57:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 6 tasks
14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:0 as TID 6 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:0 as 52790 bytes
in 1 ms
14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:1 as TID 7 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:1 as 52790 bytes
in 1 ms
14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:2 as TID 8 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:2 as 52790 bytes
in 0 ms
14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:3 as TID 9 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:3 as 52790 bytes
in 0 ms
14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:4 as TID 10 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:4 as 52790 bytes
in 1 ms
14/07/09 00:57:41 INFO TaskSetManager: Starting task 1.0:5 as TID 11 on
executor localhost: localhost (PROCESS_LOCAL)
14/07/09 00:57:41 INFO TaskSetManager: Serialized task 1.0:5 as 53060 bytes
in 1 ms
14/07/09 00:57:41 INFO Executor: Running task ID 10
14/07/09 00:57:41 INFO Executor: Running task ID 6
14/07/09 00:57:41 INFO Executor: Running task ID 9
14/07/09 00:57:41 INFO Executor: Running task ID 8
14/07/09 00:57:41 INFO Executor: Running task ID 11
14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_2 not found, computing
it
14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_4 not found, computing
it
14/07/09 00:57:41 INFO Executor: Running task ID 7
14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_5 not found, computing
it
14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_0 not found, computing
it
14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_3 not found, computing
it
14/07/09 00:57:41 INFO CacheManager: Partition rdd_4_1 not found, computing
it
14/07/09 00:57:42 INFO PythonRDD: Times: total = 268, boot = 157, init =
100, finish = 11
14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=344790, maxMem=311387750
14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_2 stored as values to
memory (estimated size 56.1 KB, free 296.6 MB)
14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_4_2 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.6 MB)
14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_2
14/07/09 00:57:42 INFO Executor: Serialized size of result for 8 is 967
14/07/09 00:57:42 INFO Executor: Sending result for 8 directly to driver
14/07/09 00:57:42 INFO Executor: Finished task ID 8
14/07/09 00:57:42 INFO TaskSetManager: Finished TID 8 in 289 ms on
localhost (progress: 1/6)
14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 2)
14/07/09 00:57:42 INFO PythonRDD: Times: total = 425, boot = 316, init =
97, finish = 12
14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=402255, maxMem=311387750
14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_1 stored as values to
memory (estimated size 56.1 KB, free 296.5 MB)
14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_4_1 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.5 MB)
14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_1
14/07/09 00:57:42 INFO Executor: Serialized size of result for 7 is 967
14/07/09 00:57:42 INFO Executor: Sending result for 7 directly to driver
14/07/09 00:57:42 INFO Executor: Finished task ID 7
14/07/09 00:57:42 INFO TaskSetManager: Finished TID 7 in 452 ms on
localhost (progress: 2/6)
14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 1)
14/07/09 00:57:42 INFO PythonRDD: Times: total = 589, boot = 479, init =
98, finish = 12
14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=459720, maxMem=311387750
14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_3 stored as values to
memory (estimated size 56.1 KB, free 296.5 MB)
14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_4_3 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.5 MB)
14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_3
14/07/09 00:57:42 INFO Executor: Serialized size of result for 9 is 967
14/07/09 00:57:42 INFO Executor: Sending result for 9 directly to driver
14/07/09 00:57:42 INFO Executor: Finished task ID 9
14/07/09 00:57:42 INFO TaskSetManager: Finished TID 9 in 612 ms on
localhost (progress: 3/6)
14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 3)
14/07/09 00:57:42 INFO PythonRDD: Times: total = 752, boot = 642, init =
98, finish = 12
14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=517185, maxMem=311387750
14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_0 stored as values to
memory (estimated size 56.1 KB, free 296.4 MB)
14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_4_0 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.4 MB)
14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_0
14/07/09 00:57:42 INFO Executor: Serialized size of result for 6 is 967
14/07/09 00:57:42 INFO Executor: Sending result for 6 directly to driver
14/07/09 00:57:42 INFO Executor: Finished task ID 6
14/07/09 00:57:42 INFO TaskSetManager: Finished TID 6 in 777 ms on
localhost (progress: 4/6)
14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 0)
14/07/09 00:57:42 INFO PythonRDD: Times: total = 919, boot = 806, init =
101, finish = 12
14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=574650, maxMem=311387750
14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_5 stored as values to
memory (estimated size 56.1 KB, free 296.4 MB)
14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_4_5 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.4 MB)
14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_5
14/07/09 00:57:42 INFO Executor: Serialized size of result for 11 is 967
14/07/09 00:57:42 INFO Executor: Sending result for 11 directly to driver
14/07/09 00:57:42 INFO Executor: Finished task ID 11
14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 5)
14/07/09 00:57:42 INFO TaskSetManager: Finished TID 11 in 938 ms on
localhost (progress: 5/6)
14/07/09 00:57:42 INFO PythonRDD: Times: total = 1079, boot = 972, init =
95, finish = 12
14/07/09 00:57:42 INFO MemoryStore: ensureFreeSpace(57465) called with
curMem=632115, maxMem=311387750
14/07/09 00:57:42 INFO MemoryStore: Block rdd_4_4 stored as values to
memory (estimated size 56.1 KB, free 296.3 MB)
14/07/09 00:57:42 INFO BlockManagerMasterActor$BlockManagerInfo: Added
rdd_4_4 in memory on shawn-PC:51231 (size: 56.1 K
B, free: 296.3 MB)
14/07/09 00:57:42 INFO BlockManagerMaster: Updated info of block rdd_4_4
14/07/09 00:57:42 INFO Executor: Serialized size of result for 10 is 967
14/07/09 00:57:42 INFO Executor: Sending result for 10 directly to driver
14/07/09 00:57:42 INFO Executor: Finished task ID 10
14/07/09 00:57:42 INFO TaskSetManager: Finished TID 10 in 1098 ms on
localhost (progress: 6/6)
14/07/09 00:57:42 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
14/07/09 00:57:42 INFO DAGScheduler: Completed ResultTask(1, 4)
14/07/09 00:57:42 INFO DAGScheduler: Stage 1 (aggregate at
NaiveBayes.scala:81) finished in 1.106 s
14/07/09 00:57:42 INFO SparkContext: Job finished: aggregate at
NaiveBayes.scala:81, took 1.114280889 s
Exception in thread "delete Spark temp dir
C:\Users\shawn\AppData\Local\Temp\spark-45a2e1f4-8229-4614-a428-593886dac0c6"
 java.io.IOException: Failed to delete:
C:\Users\shawn\AppData\Local\Temp\spark-45a2e1f4-8229-4614-a428-593886dac0c6\tmp
d7r1h0
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:483)
        at
org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479)
        at
org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
        at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478)
        at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:212)

################################
These are the logs. Can you suggest something after looking at it.



On Wed, Jul 9, 2014 at 1:10 AM, Rahul Bhojwani <rahulbhojwani2...@gmail.com>
wrote:

> Here I am adding my code. If you can have a look to help me out.
> Thanks
> #######################
>
> import tokenizer
> import gettingWordLists as gl
> from pyspark.mllib.classification import NaiveBayes
> from numpy import array
> from pyspark import SparkContext, SparkConf
>
> conf = (SparkConf().setMaster("local[6]").setAppName("My
> app").set("spark.executor.memory", "1g"))
>
> sc=SparkContext(conf = conf)
> # Getting the positive dict:
>
> pos_list = []
> pos_list = gl.getPositiveList()
> neg_list = gl.getNegativeList()
>
> #print neg_list
> tok = tokenizer.Tokenizer(preserve_case=False)
> train_data  = []
>
> with open("training_file_coach.csv","r") as train_file:
>     for line in train_file:
>         tokens = line.split("######")
>         msg = tokens[0]
>         sentiment = tokens[1]
>         pos_count = 0
>         neg_count = 0
> #        print sentiment + "\n\n"
> #        print msg
>         tokens = set(tok.tokenize(msg))
>         for i in tokens:
>             if i.encode('utf-8') in pos_list:
>                 pos_count+=1
>             if i.encode('utf-8') in neg_list:
>                 neg_count+=1
>         if sentiment.__contains__('NEG'):
>             label = 0.0
>         else:
>             label = 1.0
>
>         feature = []
>         feature.append(label)
>         feature.append(float(pos_count))
>         feature.append(float(neg_count))
>         train_data.append(feature)
>     train_file.close()
>
> model = NaiveBayes.train(sc.parallelize(array(train_data)))
>
>
> file_predicted = open("predicted_file_coach.csv","w")
>
> with open("prediction_file_coach.csv","r") as predict_file:
>     for line in predict_file:
>         msg = line[0:-1]
>         pos_count = 0
>         neg_count = 0
> #        print sentiment + "\n\n"
> #        print msg
>         tokens = set(tok.tokenize(msg))
>         for i in tokens:
>             if i.encode('utf-8') in pos_list:
>                 pos_count+=1
>             if i.encode('utf-8') in neg_list:
>                 neg_count+=1
>         prediction =
> model.predict(array([float(pos_count),float(neg_count)]))
>         if prediction == 0:
>             sentiment = "NEG"
>         elif prediction == 1:
>             sentiment = "POS"
>         else:
>             print "ERROR\n\n\n\n\n\n\nERROR"
>
>         feature = []
>         feature.append(float(prediction))
>         feature.append(float(pos_count))
>         feature.append(float(neg_count))
>         print feature
>         train_data.append(feature)
>         model = NaiveBayes.train(sc.parallelize(array(train_data)))
>         file_predicted.write(msg + "######" + sentiment + "\n")
>
> file_predicted.close()
> ###################
>
> If you can have a look at the code and help me out, It would be great
>
> Thanks
>
>
> On Wed, Jul 9, 2014 at 12:54 AM, Rahul Bhojwani <
> rahulbhojwani2...@gmail.com> wrote:
>
>> Hi Marcelo.
>> Thanks for the quick reply. Can you suggest me how to increase the memory
>> limits or how to tackle this problem. I am a novice. If you want I can post
>> my code here.
>>
>>
>> Thanks
>>
>>
>> On Wed, Jul 9, 2014 at 12:50 AM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>>
>>> This is generally a side effect of your executor being killed. For
>>> example, Yarn will do that if you're going over the requested memory
>>> limits.
>>>
>>> On Tue, Jul 8, 2014 at 12:17 PM, Rahul Bhojwani
>>> <rahulbhojwani2...@gmail.com> wrote:
>>> > HI,
>>> >
>>> > I am getting this error. Can anyone help out to explain why is this
>>> error
>>> > coming.
>>> >
>>> > ########
>>> >
>>> > Exception in thread "delete Spark temp dir
>>> >
>>> C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560"
>>> >  java.io.IOException: Failed to delete:
>>> >
>>> C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560\tmp
>>> > cmenlp
>>> >         at
>>> org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:483)
>>> >         at
>>> >
>>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479)
>>> >         at
>>> >
>>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478)
>>> >         at
>>> >
>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>> >         at
>>> > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>>> >         at
>>> org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478)
>>> >         at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:212)
>>> > PS>
>>> > ############
>>> >
>>> >
>>> >
>>> >
>>> > Thanks in advance
>>> > --
>>> > Rahul K Bhojwani
>>> > 3rd Year B.Tech
>>> > Computer Science and Engineering
>>> > National Institute of Technology, Karnataka
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>
>>
>>
>> --
>> Rahul K Bhojwani
>> 3rd Year B.Tech
>> Computer Science and Engineering
>> National Institute of Technology, Karnataka
>>
>
>
>
> --
> Rahul K Bhojwani
> 3rd Year B.Tech
> Computer Science and Engineering
> National Institute of Technology, Karnataka
>



-- 
Rahul K Bhojwani
3rd Year B.Tech
Computer Science and Engineering
National Institute of Technology, Karnataka

Reply via email to