[ https://issues.apache.org/jira/browse/SPARK-22184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188306#comment-16188306 ]
Sergey Zhemzhitsky edited comment on SPARK-22184 at 10/2/17 5:27 PM: --------------------------------------------------------------------- Hi [~sowen], Would you mind if I reopen this issue, until it is clear enough whether it is better to include GraphX fixes into SPARK-22150 or keep the changes separate from each other? Could you please suggest, should I merge changes of PR that fixes this issue into SPARK-22150 and provide a single PR which includes fixes for standard checkpointers as well as GraphX ones? I'm just asking because in case of GraphX there are also changes in Pregel and I believe that PR (SPARK-22150) with just changes of PeriodicCheckpointer (without GraphX) can be reviewed and merged into master faster than the one which includes all the changes. What do you think? was (Author: szhemzhitsky): Hi [~sowen], Should I merge changes of PR that fixes this issue into SPARK-22150 and provide a single PR which includes fixes for standard checkpointers as well as GraphX ones? I'm asking because in case of GraphX there are also changes in Pregel and I believe that PR with just changes of PeriodicCheckpointer (without GraphX) can be reviewed and merged into master faster than the one which includes all the changes. What do you think? > GraphX fails in case of insufficient memory and checkpoints enabled > ------------------------------------------------------------------- > > Key: SPARK-22184 > URL: https://issues.apache.org/jira/browse/SPARK-22184 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 2.2.0 > Environment: spark 2.2.0 > scala 2.11 > Reporter: Sergey Zhemzhitsky > > GraphX fails with FileNotFoundException in case of insufficient memory when > checkpoints are enabled. > Here is the stacktrace > {code} > Job aborted due to stage failure: Task creation failed: > java.io.FileNotFoundException: File > file:/tmp/spark-90119695-a126-47b5-b047-d656fee10c17/9b16e2a9-6c80-45eb-8736-bbb6eb840146/rdd-28/part-00000 > does not exist > java.io.FileNotFoundException: File > file:/tmp/spark-90119695-a126-47b5-b047-d656fee10c17/9b16e2a9-6c80-45eb-8736-bbb6eb840146/rdd-28/part-00000 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:539) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:752) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:529) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) > at > org.apache.spark.rdd.ReliableCheckpointRDD.getPreferredLocations(ReliableCheckpointRDD.scala:89) > at > org.apache.spark.rdd.RDD$$anonfun$preferredLocations$1.apply(RDD.scala:274) > at > org.apache.spark.rdd.RDD$$anonfun$preferredLocations$1.apply(RDD.scala:274) > at scala.Option.map(Option.scala:146) > at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1697) > ... > {code} > As GraphX uses cached RDDs intensively, the issue is only reproducible when > previously cached and checkpointed Vertex and Edge RDDs are evicted from > memory and forced to be read from disk. > For testing purposes the following parameters may be set to emulate low > memory environment > {code} > val sparkConf = new SparkConf() > .set("spark.graphx.pregel.checkpointInterval", "2") > // set testing memory to evict cached RDDs from it and force > // reading checkpointed RDDs from disk > .set("spark.testing.reservedMemory", "128") > .set("spark.testing.memory", "256") > {code} > This issue also includes SPARK-22150 and cannot be fixed until SPARK-22150 is > fixed too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org