[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2017-01-27 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r98218538
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
* return value. Exposed for testing.
*/
-  private[spark] def isCheckpointedAndMaterialized: Boolean = 
isCheckpointed
+  private[spark] def isCheckpointedAndMaterialized: Boolean =
+checkpointData.exists(_.isCheckpointed)
--- End diff --

looks good, but you forgot to make this final


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2017-01-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15396


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2017-01-25 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r97884515
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
* return value. Exposed for testing.
*/
-  private[spark] def isCheckpointedAndMaterialized: Boolean = 
isCheckpointed
+  private[spark] def isCheckpointedAndMaterialized: Boolean =
+checkpointData.exists(_.isCheckpointed)
--- End diff --

Addressed it. Can you check again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-18 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r83968123
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
* return value. Exposed for testing.
*/
-  private[spark] def isCheckpointedAndMaterialized: Boolean = 
isCheckpointed
+  private[spark] def isCheckpointedAndMaterialized: Boolean =
+checkpointData.exists(_.isCheckpointed)
--- End diff --

what do you mean. if isCheckpointedAndMaterialized = isCheckpointed, and 
you override isCheckpointed, then that would change the behavior of 
isCheckpointedAndMaterialized as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-12 Thread apivovarov
Github user apivovarov commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r83111395
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
--- End diff --

This method left as is in this PR - 
https://github.com/apache/spark/pull/15447


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r83043442
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
* return value. Exposed for testing.
*/
-  private[spark] def isCheckpointedAndMaterialized: Boolean = 
isCheckpointed
+  private[spark] def isCheckpointedAndMaterialized: Boolean =
+checkpointData.exists(_.isCheckpointed)
--- End diff --

One way to fix the duplicate code is to make this one final and have 
`isCheckpointed` call this one, then implementations of RDD can feel free to 
override `isCheckpointed` without affecting our internal implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-12 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15396#discussion_r83042522
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag](
* This is introduced as an alias for `isCheckpointed` to clarify the 
semantics of the
--- End diff --

oh wait I just realized this is exactly the same as `isCheckpointed`. It's 
kind of confusing why we have two methods that do the same thing. Is it because 
VertexRDD or something overrides one but not the other?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...

2016-10-07 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/15396

[SPARK-14804][Spark][Graphx] Fix checkpointing of VertexRDD/EdgeRDD

## What changes were proposed in this pull request?
EdgeRDD/VertexRDD overrides checkpoint() and isCheckpointed() to forward 
these to the internal partitionRDD. So when checkpoint() is called on them, its 
the partitionRDD that actually gets checkpointed. However since 
isCheckpointed() also overridden to call partitionRDD.isCheckpointed, 
EdgeRDD/VertexRDD.isCheckpointed returns true even though this RDD is actually 
not checkpointed.

This would have been fine except the RDD's internal logic for computing the 
RDD depends on isCheckpointed(). So for VertexRDD/EdgeRDD, since isCheckpointed 
is true, when computing Spark tries to read checkpoint data of 
VertexRDD/EdgeRDD even though they are not actually checkpointed. Through a 
crazy sequence of call forwarding, it reads checkpoint data of partitionsRDD 
and tries to cast it to types in Vertex/EdgeRDD. This leads to 
ClassCastException.

The minimal fix that does not change any public behavior is to modify RDD 
internal to not use public override-able API for internal logic. 

## How was this patch tested?
New unit tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-14804

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15396


commit 22d16b46e8ea18ec7a1b585103aa72f77a7e78f7
Author: Tathagata Das 
Date:   2016-10-07T22:04:31Z

Fixed checkpointing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org