[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-04-24 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r113021061
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -709,7 +709,8 @@ messages remaining.
 
 The following is the type signature of the [Pregel 
operator][GraphOps.pregel] as well as a *sketch*
 of its implementation (note: to avoid stackOverflowError due to long 
lineage chains, pregel support periodcally
-checkpoint graph and messages by setting 
"spark.graphx.pregel.checkpointInterval"):
+checkpoint graph and messages by setting 
"spark.graphx.pregel.checkpointInterval" to a positive number,
+say 10. And set checkpoint directory as well using 
SparkContext.setCheckpointDir(directory: String)):
--- End diff --

I reference the value in LDA and other ml algorithms in spark, by default 
their checkpointInterval is set to 10.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-22 Thread dding3
Github user dding3 commented on the issue:

https://github.com/apache/spark/pull/15125
  
OK, agreed. If user didn't set checkpointer directory while we turn on 
checkpoint in pregel by default, there may be exception. I will change 
spark.graphx.pregel.checkpointInterval to -1 as default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-28 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r103516676
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -708,7 +708,9 @@ messages remaining.
 > messaging function.  These constraints allow additional optimization 
within GraphX.
 
 The following is the type signature of the [Pregel 
operator][GraphOps.pregel] as well as a *sketch*
-of its implementation (note calls to graph.cache have been removed):
+of its implementation (note: to avoid stackOverflowError due to long 
lineage chains, graph and 
--- End diff --

OK. Have removed references to checkpoint in the sketch and documented the 
config property in the spark configuration document in the latest update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-21 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r102294384
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -708,7 +708,9 @@ messages remaining.
 > messaging function.  These constraints allow additional optimization 
within GraphX.
 
 The following is the type signature of the [Pregel 
operator][GraphOps.pregel] as well as a *sketch*
-of its implementation (note calls to graph.cache have been removed):
+of its implementation (note: to avoid stackOverflowError due to long 
lineage chains, graph and 
--- End diff --

About  document this config property in the Spark configuration document. 
Is it OK if I add a new Graphx section to include the config, or just add the 
config under existing section, say "execution behavor"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-20 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r102067500
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -708,7 +708,9 @@ messages remaining.
 > messaging function.  These constraints allow additional optimization 
within GraphX.
 
 The following is the type signature of the [Pregel 
operator][GraphOps.pregel] as well as a *sketch*
-of its implementation (note calls to graph.cache have been removed):
+of its implementation (note: to avoid stackOverflowError due to long 
lineage chains, graph and 
--- End diff --

OK. I am thinking in the original graphx-programming-guide, only cache is 
called. After we add do checkpointing and give notes we do that in the sketch, 
do we need to show the information in the sketch. If there is an agreement on 
remove all references to checkpoint, I will revert the changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-20 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r102065803
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -154,7 +169,9 @@ object Pregel extends Logging {
   // count the iteration
   i += 1
 }
-messages.unpersist(blocking = false)
+messageCheckpointer.unpersistDataSet()
--- End diff --

I think the thing is we use messageCheckpointer.update to do the cache, to 
make a pair, we can use it to unpersist data. Please correct me if I understand 
wrong. 
I think it's fine to add this new method as there is already a public 
method to cache data in PersistQueue, we should provide a public method to 
clean the queue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-19 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101952678
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -720,25 +722,53 @@ class GraphOps[VD, ED] {
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
 : Graph[VD, ED] = {
-// Receive the initial message at each vertex
-var g = mapVertices( (vid, vdata) => vprog(vid, vdata, initialMsg) 
).cache()
+val checkpointInterval = graph.vertices.sparkContext.getConf
--- End diff --

OK. I will change back then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-19 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101938997
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -122,27 +125,39 @@ object Pregel extends Logging {
 require(maxIterations > 0, s"Maximum number of iterations must be 
greater than 0," +
   s" but got ${maxIterations}")
 
-var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, 
initialMsg)).cache()
+val checkpointInterval = graph.vertices.sparkContext.getConf
+  .getInt("spark.graphx.pregel.checkpointInterval", 10)
--- End diff --

I think so. Currently I add the document in graphx-programming-guide.md. 
But I am not sure if it's the right place, please let me know if there is a 
better place to add the document.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101838127
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -155,6 +169,8 @@ object Pregel extends Logging {
   i += 1
 }
 messages.unpersist(blocking = false)
--- End diff --

Looks like unpersist is a protected method and we cannot access it from 
Pregel. Add a new public method to unpersist dataset to work around this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101828986
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala ---
@@ -362,12 +362,14 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: 
Graph[VD, ED]) extends Seriali
   def pregel[A: ClassTag](
   initialMsg: A,
   maxIterations: Int = Int.MaxValue,
+  checkpointInterval: Int = 25,
--- End diff --

About the default value, I think we should set it as a positive value to 
turn on checkpoint operation by default to avoid stackoverflow exception. To 
align with other implementations in spark, I would like to set 10 as default 
value. Please let me know if there is any thoughts about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101827588
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala ---
@@ -362,12 +362,14 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: 
Graph[VD, ED]) extends Seriali
   def pregel[A: ClassTag](
   initialMsg: A,
   maxIterations: Int = Int.MaxValue,
+  checkpointInterval: Int = 25,
--- End diff --

Agree with @mallman , we don't need change the api interface if we use a 
config value to controll checkpoint interval. I will udpate the PR soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101823330
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -155,6 +169,8 @@ object Pregel extends Logging {
   i += 1
 }
 messages.unpersist(blocking = false)
+graphCheckpointer.deleteAllCheckpoints()
+messageCheckpointer.deleteAllCheckpoints()
--- End diff --

I think when there is an exception during training, if we keep the 
checkpoints, there is a chance for user to recover from it. I checked in 
RandomForest/GBT in spark, looks like they only delete the checkpoints when the 
training successful finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-17 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101822138
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala ---
@@ -362,12 +362,14 @@ class GraphOps[VD: ClassTag, ED: ClassTag](graph: 
Graph[VD, ED]) extends Seriali
   def pregel[A: ClassTag](
   initialMsg: A,
   maxIterations: Int = Int.MaxValue,
+  checkpointInterval: Int = 25,
--- End diff --

25 is the value in my test. I checked this value in LDA/ALS, looks like the 
default value is 10, change it to 10 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-16 Thread dding3
Github user dding3 commented on the issue:

https://github.com/apache/spark/pull/15125
  
Both are OK for me. Please let me know if any update needed from me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-11 Thread dding3
Github user dding3 commented on the issue:

https://github.com/apache/spark/pull/15125
  
Thank you guys for reviewing the code. I have updated it based on the 
comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-11 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r100680651
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -123,16 +127,25 @@ object Pregel extends Logging {
   s" but got ${maxIterations}")
 
 var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, 
initialMsg)).cache()
+val graphCheckpointer = new PeriodicGraphCheckpointer[VD, ED](
+  checkpointInterval, graph.vertices.sparkContext)
+graphCheckpointer.update(g)
+
 // compute the messages
-var messages = GraphXUtils.mapReduceTriplets(g, sendMsg, mergeMsg)
+var messages = GraphXUtils.mapReduceTriplets(g, sendMsg, 
mergeMsg).cache()
+val messageCheckpointer = new PeriodicRDDCheckpointer[(VertexId, A)](
+  checkpointInterval, graph.vertices.sparkContext)
+messageCheckpointer.update(messages.asInstanceOf[RDD[(VertexId, A)]])
 var activeMessages = messages.count()
+
 // Loop
 var prevG: Graph[VD, ED] = null
 var i = 0
 while (activeMessages > 0 && i < maxIterations) {
   // Receive the messages and update the vertices.
   prevG = g
   g = g.joinVertices(messages)(vprog).cache()
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-11 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r100680648
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala ---
@@ -123,16 +127,25 @@ object Pregel extends Logging {
   s" but got ${maxIterations}")
 
 var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, 
initialMsg)).cache()
+val graphCheckpointer = new PeriodicGraphCheckpointer[VD, ED](
+  checkpointInterval, graph.vertices.sparkContext)
+graphCheckpointer.update(g)
+
 // compute the messages
-var messages = GraphXUtils.mapReduceTriplets(g, sendMsg, mergeMsg)
+var messages = GraphXUtils.mapReduceTriplets(g, sendMsg, 
mergeMsg).cache()
+val messageCheckpointer = new PeriodicRDDCheckpointer[(VertexId, A)](
+  checkpointInterval, graph.vertices.sparkContext)
+messageCheckpointer.update(messages.asInstanceOf[RDD[(VertexId, A)]])
--- End diff --

I think we need cache graph/messages here so they don't need to be computed 
again in the loop. I agree with you and I will keep the checkpointer update 
calls and remove all .cache calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-01-17 Thread dding3
Github user dding3 commented on the issue:

https://github.com/apache/spark/pull/15125
  
Sure. I will work on the rebase and update the PR soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2016-09-16 Thread dding3
GitHub user dding3 opened a pull request:

https://github.com/apache/spark/pull/15125

[SPARK-5484][GraphX] Periodically do checkpoint in Pregel 

## What changes were proposed in this pull request?
Pregel-based iterative algorithms with more than ~50 iterations begin to 
slow down and eventually fail with a StackOverflowError due to Spark's lack of 
support for long lineage chains.

This PR causes Pregel to checkpoint the graph periodically if the 
checkpoint directory is set.
This PR moves PeriodicGraphCheckpointer.scala from mllib to graphx, moves 
PeriodicRDDCheckpointer.scala, PeriodicCheckpointer.scala from mllib to core


## How was this patch tested?
unit tests, manual tests
(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dding3/spark cp2_pregel

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15125


commit 3268ca2144519a1d84484c116e16e6064553a6c1
Author: ding <ding@localhost.localdomain>
Date:   2016-09-16T17:51:58Z

test

commit ffeadbe96b7fdb491962e065f97ece09fd1a1282
Author: ding <ding@localhost.localdomain>
Date:   2016-09-16T18:13:22Z

period do checkpoint in pregel

commit 720a741db037a94ab86750fb3c9d5a54732da5e1
Author: ding <ding@localhost.localdomain>
Date:   2016-09-16T18:37:50Z

remove unused code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15124: [SPARK-17559][MLLIB]persist edges if their storag...

2016-09-16 Thread dding3
GitHub user dding3 opened a pull request:

https://github.com/apache/spark/pull/15124

[SPARK-17559][MLLIB]persist edges if their storage level is non in 
PeriodicGraphCheckpointer

## What changes were proposed in this pull request?
When use PeriodicGraphCheckpointer to persist graph, sometimes the edges 
isn't persisted. As currently only when vertices's storage level is none, graph 
is persisted. However there is a chance vertices's storage level is not none 
while edges's is none. Eg. graph created by a outerJoinVertices operation, 
vertices is automatically cached while edges is not. In this way, edges will 
not be persisted if we use PeriodicGraphCheckpointer do persist. We need 
separately check edges's storage level and persisted it if it's none.


## How was this patch tested?
 manual tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dding3/spark spark-persisitEdge

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15124


commit bf94b4dbcc4e8e0602715dce92f5053608674b43
Author: ding <ding@localhost.localdomain>
Date:   2016-09-16T17:04:27Z

persist edges if their storage level is non in PeriodicGraphCheckpointer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15116: [SPARK-17559][MLLIB]persist edges if their storage level...

2016-09-16 Thread dding3
Github user dding3 commented on the issue:

https://github.com/apache/spark/pull/15116
  
Close the pr since it mix with another unrelated commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15116: [SPARK-17559][MLLIB]persist edges if their storag...

2016-09-16 Thread dding3
Github user dding3 closed the pull request at:

https://github.com/apache/spark/pull/15116


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15116: [SPARK-17559][MLLIB]persist edges if their storag...

2016-09-15 Thread dding3
GitHub user dding3 opened a pull request:

https://github.com/apache/spark/pull/15116

[SPARK-17559][MLLIB]persist edges if their storage level is none in 
PeriodicGraphCheckpointer 

## What changes were proposed in this pull request?
When use PeriodicGraphCheckpointer to persist graph, sometimes the edges 
isn't persisted. As currently only when vertices's storage level is none, graph 
is persisted. However there is a chance vertices's storage level is not none 
while edges's is none. Eg. graph created by a outerJoinVertices operation, 
vertices is automatically cached while edges is not. In this way, edges will 
not be persisted if we use PeriodicGraphCheckpointer do persist. We need 
separately check edges's storage level and persisted it if it's none.


## How was this patch tested?
manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dding3/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15116


commit ad29af46b34d2d156078aba48b8e0427136fc6dd
Author: ding <ding@localhost.localdomain>
Date:   2016-09-15T21:39:10Z

persist edges if their storage level is none




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15562][ML] Delete temp directory after ...

2016-05-27 Thread dding3
Github user dding3 commented on the pull request:

https://github.com/apache/spark/pull/13328#issuecomment-222070319
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15562][ML] Delete temp directory after ...

2016-05-26 Thread dding3
Github user dding3 commented on the pull request:

https://github.com/apache/spark/pull/13328#issuecomment-222039995
  
Thanks for pointing out it. I think you are right, delete the checkpoint 
directory after it's created is on purpose. I will add "delete checkpoint 
directory" back. Besides, the temp directory will be deleted after the test 
finished as it has been registered to delete if VM shuts down when it's 
created. I have verified that, the temp checkpoint directory is not deleted 
after test finished in the previous code, while it's deleted in the new code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15562][ML] Delete temp directory after ...

2016-05-26 Thread dding3
GitHub user dding3 opened a pull request:

https://github.com/apache/spark/pull/13328

[SPARK-15562][ML] Delete temp directory after program exit in 
DataFrameExample

## What changes were proposed in this pull request?
Temp directory used to save records is not deleted after program exit in 
DataFrameExample. Although it called deleteOnExit, it doesn't work as the 
directory is not empty. Similar things happend in ContextCleanerSuite. Update 
the code to make sure temp directory is deleted after program exit.


## How was this patch tested?

unit tests and local build.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dding3/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13328.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13328






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15172][ML] Explicitly tell user initial...

2016-05-08 Thread dding3
Github user dding3 commented on the pull request:

https://github.com/apache/spark/pull/12948#issuecomment-217716898
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15172][ML] Improve LogisticRegression w...

2016-05-06 Thread dding3
GitHub user dding3 opened a pull request:

https://github.com/apache/spark/pull/12948

[SPARK-15172][ML] Improve LogisticRegression warning message

## What changes were proposed in this pull request?
Explicitly tell user initial coefficients is ignored if its size doesn't 
match expected size in LogisticRegression


## How was this patch tested?
local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dding3/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12948


commit 54cfaed5d307513cfd94a5807cf26a16695313ef
Author: dding3 <dingd...@dingding-ubuntu.sh.intel.com>
Date:   2016-05-06T06:15:10Z

Improve LogisticRegression warning message




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14969][MLLib] Remove duplicate implemen...

2016-04-28 Thread dding3
Github user dding3 commented on the pull request:

https://github.com/apache/spark/pull/12747#issuecomment-215609146
  
@srowen Thanks for your review. I have removed it in ANNGradient. Besides, 
I checked all subclass of Gradient, looks like there is no duplicate 
implementation now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14969][MLLib] Remove duplicate implemen...

2016-04-27 Thread dding3
Github user dding3 commented on the pull request:

https://github.com/apache/spark/pull/12747#issuecomment-215280895
  
Thanks for your comments. Removed the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Remove duplicate implementation of compute in ...

2016-04-27 Thread dding3
GitHub user dding3 opened a pull request:

https://github.com/apache/spark/pull/12747

Remove duplicate implementation of compute in LogisticGradient

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

This PR removes duplicate implementation of compute in LogisticGradient 
class


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)

unit tests
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dding3/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12747


commit a88b7f40eb6e363de8e530a2e377d6027cb91b16
Author: dding3 <dingd...@dingding-ubuntu.sh.intel.com>
Date:   2016-04-26T08:31:28Z

remove unnecessary compute method in LogisticGRadient

commit 3695a52b7e3d7e95cbc0ec30ea8a76da53d59a70
Author: dding3 <dingd...@dingding-ubuntu.sh.intel.com>
Date:   2016-04-26T08:33:08Z

Merge branch 'upstream_master'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

2014-12-23 Thread dding3
Github user dding3 commented on the pull request:

https://github.com/apache/spark/pull/2956#issuecomment-67931182
  
We have tested the patch in below senarios and find it works :
1. Apply checkpoint. RDD has been flush to disk as expected
2. Doesn't apply checkpoint. There is no performance degradation, our 
app(pagerank) only spend 1 more second compared to spark without patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org