[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2015-04-04 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395599#comment-14395599
 ] 

Sean Owen commented on SPARK-3625:
--

Why would this change be necessary in order to use checkpointing? see the 
discussion above.

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li

 The reproduce code:
 {code}
 sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000)).map(_ + 1)
 c.count
 val dep = c.dependencies.head.rdd
 c.checkpoint()
 c.count
 assert(dep != c.dependencies.head.rdd)
 {code}
 This limit is too strict , This makes it difficult to implement SPARK-3623 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2015-04-04 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395613#comment-14395613
 ] 

Guoqiang Li commented on SPARK-3625:


Sometimes, when calling the   RDD.checkpoint   , we cannot determine it before 
any job has been
executed on this RDD. Just like 
[PeriodicGraphCheckpointer|https://github.com/apache/spark/blob/branch-1.3/mllib/src/main/scala/org/apache/spark/mllib/impl/PeriodicGraphCheckpointer.scala]
 

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li

 The reproduce code:
 {code}
 sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000)).map(_ + 1)
 c.count
 val dep = c.dependencies.head.rdd
 c.checkpoint()
 c.count
 assert(dep != c.dependencies.head.rdd)
 {code}
 This limit is too strict , This makes it difficult to implement SPARK-3623 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2015-04-03 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395492#comment-14395492
 ] 

Guoqiang Li commented on SPARK-3625:


When we run the machine learning and graph algorithms, this feature is very 
necessary, I think we should merge this PR 2480  to master .

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li

 The reproduce code:
 {code}
 sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000)).map(_ + 1)
 c.count
 val dep = c.dependencies.head.rdd
 c.checkpoint()
 c.count
 assert(dep != c.dependencies.head.rdd)
 {code}
 This limit is too strict , This makes it difficult to implement SPARK-3623 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-10-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156663#comment-14156663
 ] 

Apache Spark commented on SPARK-3625:
-

User 'witgo' has created a pull request for this issue:
https://github.com/apache/spark/pull/2631

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li

 The reproduce code:
 {code}
 sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000)).map(_ + 1)
 c.count
 val dep = c.dependencies.head.rdd
 c.checkpoint()
 c.count
 assert(dep != c.dependencies.head.rdd)
 {code}
 This limit is too strict , This makes it difficult to implement SPARK-3623 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143153#comment-14143153
 ] 

Sean Owen commented on SPARK-3625:
--

This prints 1000 both times for me, which is correct. When you say doesn't 
work, could you please elaborate? different count? exception? what is your 
environment?

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li
Priority: Blocker

 The reproduce code:
 {code}
  sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000))
 c.count
 c.checkpoint()
 c.count
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143342#comment-14143342
 ] 

Sean Owen commented on SPARK-3625:
--

It still prints 1000 both times, which is correct. Your assertion is about 
something different. The assertion fails, but, the behavior you are asserting 
is not what the javadoc suggests:

{quote}
Mark this RDD for checkpointing. It will be saved to a file inside the 
checkpoint
directory set with SparkContext.setCheckpointDir() and all references to its 
parent
RDDs will be removed. This function must be called before any job has been
executed on this RDD. It is strongly recommended that this RDD is persisted in
memory, otherwise saving it on a file will require recomputation.
{quote}

This example calls count() before checkpoint(). If you don't, I think you get 
the expected behavior, since the dependency becomes a CheckpointRDD. This looks 
like not a bug.

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li
Priority: Blocker

 The reproduce code:
 {code}
 sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000)).map(_ + 1)
 c.count
 val dep = c.dependencies.head.rdd
 c.checkpoint()
 c.count
 assert(dep != c.dependencies.head.rdd)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143410#comment-14143410
 ] 

Guoqiang Li commented on SPARK-3625:


Ok, it has been modified to improvement
This limit is too strict , SPARK-3623 relies on here.

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li

 The reproduce code:
 {code}
 sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000)).map(_ + 1)
 c.count
 val dep = c.dependencies.head.rdd
 c.checkpoint()
 c.count
 assert(dep != c.dependencies.head.rdd)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142471#comment-14142471
 ] 

Apache Spark commented on SPARK-3625:
-

User 'witgo' has created a pull request for this issue:
https://github.com/apache/spark/pull/2480

 In some cases, the RDD.checkpoint does not work
 ---

 Key: SPARK-3625
 URL: https://issues.apache.org/jira/browse/SPARK-3625
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Guoqiang Li
Assignee: Guoqiang Li
Priority: Blocker

 The reproduce code:
 {code}
  sc.setCheckpointDir(checkpointDir)
 val c = sc.parallelize((1 to 1000))
 c.count
 c.checkpoint()
 c.count
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org