Hi,
I have successfully reduced my data and store it in JavaDStream<BSONObject>
Now, i want to save this data in mongodb for this i have used BSONObject
type.
But, when i try to save it, it is giving exception.
For this, i also try to save it just as *saveAsTextFile *but same exception.
Error Log : attached full log file
Excerpt from log file.
2015-08-11 11:18:52,663 INFO
(org.apache.spark.storage.BlockManagerMaster:59) - Updated info of block
broadcast_4_piece0
2015-08-11 11:18:52,664 INFO (org.apache.spark.SparkContext:59) - Created
broadcast 4 from broadcast at DAGScheduler.scala:839
2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.DAGScheduler:59)
- Submitting 2 missing tasks from Stage 7 (MapPartitionsRDD[5] at foreach
at DirectStream.java:167)
2015-08-11 11:18:52,664 INFO
(org.apache.spark.scheduler.TaskSchedulerImpl:59) - Adding task set 7.0
with 2 tasks
2015-08-11 11:18:52,665 INFO
(org.apache.spark.scheduler.TaskSetManager:59) - Starting task 0.0 in stage
7.0 (TID 5, localhost, PROCESS_LOCAL, 1056 bytes)
2015-08-11 11:18:52,666 INFO
(org.apache.spark.scheduler.TaskSetManager:59) - Starting task 1.0 in stage
7.0 (TID 6, localhost, PROCESS_LOCAL, 1056 bytes)
2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) -
Running task 0.0 in stage 7.0 (TID 5)
2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) -
Running task 1.0 in stage 7.0 (TID 6)
2015-08-11 11:18:52,827 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2
non-empty blocks out of 2 blocks
2015-08-11 11:18:52,828 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0
remote fetches in 1 ms
2015-08-11 11:18:52,846 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2
non-empty blocks out of 2 blocks
2015-08-11 11:18:52,847 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0
remote fetches in 1 ms
2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96)
- Exception
in task 1.0 in stage 7.0 (TID 6)
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) -
Exception in task 0.0 in stage 7.0 (TID 5)
java.lang.NullPointerException
Code for saving output :
// for MongoDB
Configuration outputConfig = new Configuration();
outputConfig.set("mongo.output.uri",
"mongodb://localhost:27017/test.spark");
outputConfig.set("mongo.output.format",
"com.mongodb.hadoop.MongoOutputFormat");
JavaDStream<BSONObject> suspectedStream
suspectedStream.foreach(new Function<JavaRDD<BSONObject>, Void>() {
private static final long serialVersionUID =
4414703053334523053L;
@Override
public Void call(JavaRDD<BSONObject> rdd) throws Exception {
logger.info(rdd.first());
rdd.saveAsTextFile("E://");
rdd.saveAsNewAPIHadoopFile("", Object.class,
BSONObject.class, MongoOutputFormat.class,outputConfig);
return null;
}
});
Regards,
Deepesh
2015-08-11 11:18:52,265 INFO
(org.apache.spark.streaming.scheduler.JobScheduler:59) - Finished job streaming
job 1439272130000 ms.1 from job set of time 1439272130000 ms
2015-08-11 11:18:52,265 INFO
(org.apache.spark.streaming.scheduler.JobScheduler:59) - Starting job streaming
job 1439272130000 ms.2 from job set of time 1439272130000 ms
2015-08-11 11:18:52,271 INFO (org.apache.spark.SparkContext:59) - Starting
job: foreach at DirectStream.java:167
2015-08-11 11:18:52,274 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Got job 2 (foreach at DirectStream.java:167) with 1 output partitions
(allowLocal=true)
2015-08-11 11:18:52,274 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Final stage: Stage 5(foreach at DirectStream.java:167)
2015-08-11 11:18:52,274 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Parents of final stage: List(Stage 4)
2015-08-11 11:18:52,276 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Missing parents: List()
2015-08-11 11:18:52,276 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Submitting Stage 5 (MapPartitionsRDD[4] at map at DirectStream.java:119), which
has no missing parents
2015-08-11 11:18:52,278 INFO (org.apache.spark.storage.MemoryStore:59) -
ensureFreeSpace(3152) called with curMem=16781, maxMem=1018932756
2015-08-11 11:18:52,279 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_3 stored as values in memory (estimated size 3.1 KB, free 971.7 MB)
2015-08-11 11:18:52,281 INFO (org.apache.spark.storage.MemoryStore:59) -
ensureFreeSpace(2261) called with curMem=19933, maxMem=1018932756
2015-08-11 11:18:52,281 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_3_piece0 stored as bytes in memory (estimated size 2.2 KB, free 971.7
MB)
2015-08-11 11:18:52,283 INFO (org.apache.spark.storage.BlockManagerInfo:59) -
Added broadcast_3_piece0 in memory on localhost:65012 (size: 2.2 KB, free:
971.7 MB)
2015-08-11 11:18:52,284 INFO (org.apache.spark.storage.BlockManagerMaster:59)
- Updated info of block broadcast_3_piece0
2015-08-11 11:18:52,284 INFO (org.apache.spark.SparkContext:59) - Created
broadcast 3 from broadcast at DAGScheduler.scala:839
2015-08-11 11:18:52,285 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Submitting 1 missing tasks from Stage 5 (MapPartitionsRDD[4] at map at
DirectStream.java:119)
2015-08-11 11:18:52,285 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59)
- Adding task set 5.0 with 1 tasks
2015-08-11 11:18:52,286 INFO (org.apache.spark.scheduler.TaskSetManager:59) -
Starting task 0.0 in stage 5.0 (TID 4, localhost, PROCESS_LOCAL, 1056 bytes)
2015-08-11 11:18:52,287 INFO (org.apache.spark.executor.Executor:59) - Running
task 0.0 in stage 5.0 (TID 4)
2015-08-11 11:18:52,290 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty
blocks out of 2 blocks
2015-08-11 11:18:52,290 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote
fetches in 0 ms
2015-08-11 11:18:52,307 INFO (com.spark.DirectStream:123) -
(articleId_488691_host_luxpresso.com,1)
2015-08-11 11:18:52,308 INFO (com.spark.DirectStream:130) - { "articleId" :
"488691" , "host" : "luxpresso.com" , "count" : 1}
2015-08-11 11:18:52,309 INFO (org.apache.spark.executor.Executor:59) -
Finished task 0.0 in stage 5.0 (TID 4). 1248 bytes result sent to driver
2015-08-11 11:18:52,311 INFO (org.apache.spark.scheduler.TaskSetManager:59) -
Finished task 0.0 in stage 5.0 (TID 4) in 24 ms on localhost (1/1)
2015-08-11 11:18:52,311 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59)
- Removed TaskSet 5.0, whose tasks have all completed, from pool
2015-08-11 11:18:52,312 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Stage 5 (foreach at DirectStream.java:167) finished in 0.026 s
2015-08-11 11:18:52,313 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Job 2 finished: foreach at DirectStream.java:167, took 0.040501 s
2015-08-11 11:18:52,313 INFO (com.spark.DirectStream:174) - { "articleId" :
"488691" , "host" : "luxpresso.com" , "count" : 1}
2015-08-11 11:18:52,349 INFO (org.apache.spark.storage.BlockManager:59) -
Removing broadcast 3
2015-08-11 11:18:52,351 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_3_piece0
2015-08-11 11:18:52,352 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_3_piece0 of size 2261 dropped from memory (free 1018912823)
2015-08-11 11:18:52,355 INFO (org.apache.spark.storage.BlockManagerInfo:59) -
Removed broadcast_3_piece0 on localhost:65012 in memory (size: 2.2 KB, free:
971.7 MB)
2015-08-11 11:18:52,383 INFO (org.apache.spark.storage.BlockManagerMaster:59)
- Updated info of block broadcast_3_piece0
2015-08-11 11:18:52,384 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_3
2015-08-11 11:18:52,384 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_3 of size 3152 dropped from memory (free 1018915975)
2015-08-11 11:18:52,408 INFO (org.apache.spark.ContextCleaner:59) - Cleaned
broadcast 3
2015-08-11 11:18:52,414 INFO (org.apache.spark.storage.BlockManager:59) -
Removing broadcast 2
2015-08-11 11:18:52,416 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_2_piece0
2015-08-11 11:18:52,416 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_2_piece0 of size 2261 dropped from memory (free 1018918236)
2015-08-11 11:18:52,418 INFO (org.apache.spark.storage.BlockManagerInfo:59) -
Removed broadcast_2_piece0 on localhost:65012 in memory (size: 2.2 KB, free:
971.7 MB)
2015-08-11 11:18:52,419 INFO (org.apache.spark.storage.BlockManagerMaster:59)
- Updated info of block broadcast_2_piece0
2015-08-11 11:18:52,419 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_2
2015-08-11 11:18:52,419 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_2 of size 3152 dropped from memory (free 1018921388)
2015-08-11 11:18:52,420 INFO (org.apache.spark.ContextCleaner:59) - Cleaned
broadcast 2
2015-08-11 11:18:52,421 INFO (org.apache.spark.storage.BlockManager:59) -
Removing broadcast 1
2015-08-11 11:18:52,422 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_1
2015-08-11 11:18:52,422 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_1 of size 2264 dropped from memory (free 1018923652)
2015-08-11 11:18:52,422 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_1_piece0
2015-08-11 11:18:52,423 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_1_piece0 of size 1675 dropped from memory (free 1018925327)
2015-08-11 11:18:52,424 INFO (org.apache.spark.storage.BlockManagerInfo:59) -
Removed broadcast_1_piece0 on localhost:65012 in memory (size: 1675.0 B, free:
971.7 MB)
2015-08-11 11:18:52,424 INFO (org.apache.spark.storage.BlockManagerMaster:59)
- Updated info of block broadcast_1_piece0
2015-08-11 11:18:52,425 INFO (org.apache.spark.ContextCleaner:59) - Cleaned
broadcast 1
2015-08-11 11:18:52,426 INFO (org.apache.spark.storage.BlockManager:59) -
Removing broadcast 0
2015-08-11 11:18:52,426 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_0
2015-08-11 11:18:52,427 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_0 of size 4328 dropped from memory (free 1018929655)
2015-08-11 11:18:52,427 INFO (org.apache.spark.storage.BlockManager:59) -
Removing block broadcast_0_piece0
2015-08-11 11:18:52,427 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_0_piece0 of size 3101 dropped from memory (free 1018932756)
2015-08-11 11:18:52,429 INFO (org.apache.spark.storage.BlockManagerInfo:59) -
Removed broadcast_0_piece0 on localhost:65012 in memory (size: 3.0 KB, free:
971.7 MB)
2015-08-11 11:18:52,429 INFO (org.apache.spark.storage.BlockManagerMaster:59)
- Updated info of block broadcast_0_piece0
2015-08-11 11:18:52,432 INFO (org.apache.spark.ContextCleaner:59) - Cleaned
broadcast 0
2015-08-11 11:18:52,569 INFO
(org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.tip.id is
deprecated. Instead, use mapreduce.task.id
2015-08-11 11:18:52,569 INFO
(org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.task.id is
deprecated. Instead, use mapreduce.task.attempt.id
2015-08-11 11:18:52,569 INFO
(org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.task.is.map is
deprecated. Instead, use mapreduce.task.ismap
2015-08-11 11:18:52,570 INFO
(org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.task.partition
is deprecated. Instead, use mapreduce.task.partition
2015-08-11 11:18:52,570 INFO
(org.apache.hadoop.conf.Configuration.deprecation:1009) - mapred.job.id is
deprecated. Instead, use mapreduce.job.id
2015-08-11 11:18:52,630 INFO (org.apache.spark.SparkContext:59) - Starting
job: foreach at DirectStream.java:167
2015-08-11 11:18:52,631 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Got job 3 (foreach at DirectStream.java:167) with 2 output partitions
(allowLocal=false)
2015-08-11 11:18:52,632 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Final stage: Stage 7(foreach at DirectStream.java:167)
2015-08-11 11:18:52,632 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Parents of final stage: List(Stage 6)
2015-08-11 11:18:52,633 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Missing parents: List()
2015-08-11 11:18:52,633 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Submitting Stage 7 (MapPartitionsRDD[5] at foreach at DirectStream.java:167),
which has no missing parents
2015-08-11 11:18:52,659 INFO (org.apache.spark.storage.MemoryStore:59) -
ensureFreeSpace(107792) called with curMem=0, maxMem=1018932756
2015-08-11 11:18:52,660 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_4 stored as values in memory (estimated size 105.3 KB, free 971.6 MB)
2015-08-11 11:18:52,662 INFO (org.apache.spark.storage.MemoryStore:59) -
ensureFreeSpace(64018) called with curMem=107792, maxMem=1018932756
2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.MemoryStore:59) - Block
broadcast_4_piece0 stored as bytes in memory (estimated size 62.5 KB, free
971.6 MB)
2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.BlockManagerInfo:59) -
Added broadcast_4_piece0 in memory on localhost:65012 (size: 62.5 KB, free:
971.7 MB)
2015-08-11 11:18:52,663 INFO (org.apache.spark.storage.BlockManagerMaster:59)
- Updated info of block broadcast_4_piece0
2015-08-11 11:18:52,664 INFO (org.apache.spark.SparkContext:59) - Created
broadcast 4 from broadcast at DAGScheduler.scala:839
2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Submitting 2 missing tasks from Stage 7 (MapPartitionsRDD[5] at foreach at
DirectStream.java:167)
2015-08-11 11:18:52,664 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59)
- Adding task set 7.0 with 2 tasks
2015-08-11 11:18:52,665 INFO (org.apache.spark.scheduler.TaskSetManager:59) -
Starting task 0.0 in stage 7.0 (TID 5, localhost, PROCESS_LOCAL, 1056 bytes)
2015-08-11 11:18:52,666 INFO (org.apache.spark.scheduler.TaskSetManager:59) -
Starting task 1.0 in stage 7.0 (TID 6, localhost, PROCESS_LOCAL, 1056 bytes)
2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) - Running
task 0.0 in stage 7.0 (TID 5)
2015-08-11 11:18:52,666 INFO (org.apache.spark.executor.Executor:59) - Running
task 1.0 in stage 7.0 (TID 6)
2015-08-11 11:18:52,827 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty
blocks out of 2 blocks
2015-08-11 11:18:52,828 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote
fetches in 1 ms
2015-08-11 11:18:52,846 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Getting 2 non-empty
blocks out of 2 blocks
2015-08-11 11:18:52,847 INFO
(org.apache.spark.storage.ShuffleBlockFetcherIterator:59) - Started 0 remote
fetches in 1 ms
2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) -
Exception in task 1.0 in stage 7.0 (TID 6)
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-08-11 11:18:52,965 ERROR (org.apache.spark.executor.Executor:96) -
Exception in task 0.0 in stage 7.0 (TID 5)
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-08-11 11:18:52,975 WARN (org.apache.spark.scheduler.TaskSetManager:71) -
Lost task 0.0 in stage 7.0 (TID 5, localhost): java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-08-11 11:18:52,978 ERROR (org.apache.spark.scheduler.TaskSetManager:75) -
Task 0 in stage 7.0 failed 1 times; aborting job
2015-08-11 11:18:52,980 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59)
- Removed TaskSet 7.0, whose tasks have all completed, from pool
2015-08-11 11:18:52,981 INFO (org.apache.spark.scheduler.TaskSetManager:59) -
Lost task 1.0 in stage 7.0 (TID 6) on executor localhost:
java.lang.NullPointerException (null) [duplicate 1]
2015-08-11 11:18:52,981 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59)
- Removed TaskSet 7.0, whose tasks have all completed, from pool
2015-08-11 11:18:52,983 INFO (org.apache.spark.scheduler.TaskSchedulerImpl:59)
- Cancelling stage 7
2015-08-11 11:18:52,987 INFO (org.apache.spark.scheduler.DAGScheduler:59) -
Job 3 failed: foreach at DirectStream.java:167, took 0.356095 s
2015-08-11 11:18:52,989 ERROR
(org.apache.spark.streaming.scheduler.JobScheduler:96) - Error running job
streaming job 1439272130000 ms.2
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 (TID
5, localhost): java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to
stage failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost
task 0.0 in stage 7.0 (TID 5, localhost): java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:467)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]