Hello,

  I have a Spark app which I run with master "local[3]". When running without 
any persist calls, it seems to work fine, but as soon as I add persist calls 
(at default storage level), it fails at the first persist call with the message 
below. Unfortunately, I can't post the code. Polling the JVM memory stats while 
the app is running seems to indicate that the JVM has not yet grown to its 
maximum size.

  Any advice? Thanks!

     Best, Oliver

14/10/28 10:51:30 INFO storage.MemoryStore: 
ensureFreeSpace(-9223372036842471144) called with curMem=1760, maxMem=3523372646
14/10/28 10:51:30 INFO storage.MemoryStore: Block rdd_1_2 stored as values in 
memory (estimated size -9223372036842471400.0 B, free -9223372033343709200.0 B)
14/10/28 10:51:30 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 
(TID 2)
java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
negative: -9223372036842471144
       at scala.Predef$.require(Predef.scala:233)
       at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767)
       at org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
       at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
       at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
       at org.apache.spark.scheduler.Task.run(Task.scala:54)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
       at java.lang.Thread.run(Unknown Source)
14/10/28 10:51:30 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 
(TID 3, localhost, PROCESS_LOCAL, 3961 bytes)
14/10/28 10:51:30 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
14/10/28 10:51:30 INFO spark.CacheManager: Partition rdd_1_3 not found, 
computing it
14/10/28 10:51:30 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 
(TID 2, localhost): java.lang.IllegalArgumentException: requirement failed: 
sizeInBytes was negative: -9223372036842471144
        scala.Predef$.require(Predef.scala:233)
        org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767)
        org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
        org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
        org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        java.lang.Thread.run(Unknown Source)
14/10/28 10:51:30 ERROR scheduler.TaskSetManager: Task 2 in stage 0.0 failed 1 
times; aborting job
14/10/28 10:51:30 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/10/28 10:51:30 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 0.0 
in stage 0.0 (TID 0)
14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 1.0 
in stage 0.0 (TID 1)
14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 3.0 
in stage 0.0 (TID 3)
14/10/28 10:51:30 INFO scheduler.DAGScheduler: Failed to run count at XXXXX
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost 
task 2.0 in stage 0.0 (TID 2, localhost): java.lang.IllegalArgumentException: 
requirement failed: sizeInBytes was negative: -9223372036842471144
        scala.Predef$.require(Predef.scala:233)
        org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
        org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767)
        org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
        org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
        org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        java.lang.Thread.run(Unknown Source)
Driver stacktrace:
       at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
       at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
       at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
       at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
       at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
       at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
       at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
       at scala.Option.foreach(Option.scala:236)
       at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
       at akka.actor.ActorCell.invoke(ActorCell.scala:456)
       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
       at akka.dispatch.Mailbox.run(Mailbox.scala:219)
       at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
       at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
       at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
       at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.com<mailto:oliver.ruebenac...@altisource.com> | 
www.Altisource.com

***********************************************************************************************************************

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***********************************************************************************************************************

Reply via email to