[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-09-30 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
Hello @sujithjay, @felixcheung, @jkbradley, @mengxr, it's already more than 
a year passed since this pull request has been opened. 
I'm just wondering whether there is any chance for this PR to be reviewed 
(understanding that all of you have a little or probably no time having your 
own more important activities) by someone and either rejected or merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-06-25 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
Just a kind remainder...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-04-02 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
> so the fix might be to change to call checkpoint() to checkpoint(eager: 
true) - this ensures by the time checkpoint call is returned the checkpointing 
is completed.

Even if checkpoint is completed, `PeriodicRDDCheckpointer` removes files of 
the checkpointed and materialized RDDs later on, so it may happen that another 
RDD depends on the already removed files.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-29 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
@felixcheung, 
Unfortunately, RDDs, `PeriodicRDDCheckpointer` is based on, do not have 
`checkpoint(eager: true)` yet. 
It's a functionality of DataSets.

I've experimented with the similar method for RDDs ...

```scala
def checkpoint(eager: Boolean): RDD[T] = {
  checkpoint()
  if (eager) {
count()
  }
  this
}
```

... and it does not work for `PeriodicRDDCheckpointer` in some scenarios.
Please, consider the following example

```scala
val checkpointInterval = 2

val checkpointer = new PeriodicRDDCheckpointer[(Int, 
Int)](checkpointInterval, sc)
val rdd1 = sc.makeRDD((0 until 10).map(i => i -> i))

// rdd1 is not materialized yet, checkpointer(update=1, 
checkpointInterval=2)
checkpointer.update(rdd1)
// rdd2 depends on rdd1
val rdd2 = rdd1.filter(_ => true)

// rdd1 is materialized, checkpointer(update=2, checkpointInterval=2)
checkpointer.update(rdd1)
// rdd3 depends on rdd1
val rdd3 = rdd1.filter(_ => true)

// rdd3 is not materialized yet, checkpointer(update=3, 
checkpointInterval=2)
checkpointer.update(rdd3)
// rdd3 is materialized, rdd1's files are removed, checkpointer(update=4, 
checkpointInterval=2)
checkpointer.update(rdd3)

// fails with FileNotFoundException because
// rdd1's files were removed on the previous step and
// rdd2 depends on rdd1
rdd2.count()
```
It fails with `FileNotFoundException` even in case of `eager` 
checkpointing, and passes in case of preserving parent checkpointed RDDs like 
it's done in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-28 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19373
  
to clarify, what I mean is the issue is caused by checkpointing being lazy 
- so therefore if you remove the previous checkpoint before the new checkpoint 
is started or completed, this fails.

so the fix might be to change to call `checkpoint()` to `checkpoint(eager: 
true)` - this ensures by the time checkpoint call is returned the checkpointing 
is completed.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-27 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
BTW, how do you think guys, may be it would be better to merge changes from 
#19410 into this one? 
The #19410 is almost about the same issue and fixes the described behaviour 
for GraphX.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-27 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
@felixcheung

> It is deleting earlier checkpoint after the current checkpoint is called 
though?

Currently PeriodicCheckpointer can fail in case of checkpointing RDDs which 
depend on each other like in the sample below.
```
// create a periodic checkpointer with interval of 2
val checkpointer = new PeriodicRDDCheckpointer[Double](2, sc)
val rdd1 = createRDD(sc)

// rdd2 depends on rdd1
val rdd2 = rdd1.filter(_ => true)
checkpointer.update(rdd2)
// on the second update rdd2 is checkpointed and checkpoint files of rdd1 
are deleted
checkpointer.update(rdd2)
// on action it's necessary to read already removed checkpoint files of rdd1
rdd2.count()
```
It's about deleting files of the already checkpointed and materialized RDD 
in case of another RDD depends on it.

If RDDs are cached before checkpointing (like it is often recommended) then 
this issue is likely to be not visible, because the checkpointed RDD will be 
read from cache and not from the materiazed files. 

The good example of such a behaviour is described in this PR - #19410, 
where GraphX fails with `FileNotFoundException` in case of insufficient memory 
resources when cached blocks of checkpointed and materialized RDDs are evicted 
from memory, causing them to be read from already deleted files.

> is this just an issue with DataSet.checkpoint(eager = true)?

This PR does not include modifications to DataSet API and affects mainly 
`PeriodicCheckpointer` and `PeriodicRDDCheckpointer`. 
It was created as a preliminary PR to this one - #19410 (where GraphX fails 
in case of reading cached RDDs already evicted from memory).



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-26 Thread sujithjay
Github user sujithjay commented on the issue:

https://github.com/apache/spark/pull/19373
  
cc: @felixcheung @jkbradley @mengxr 
Could you please review this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-03-26 Thread sujithjay
Github user sujithjay commented on the issue:

https://github.com/apache/spark/pull/19373
  
Hi @szhem , you could add consider identifying contributors who have worked 
on the code being changed, and reach out to them for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2017-10-16 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19373
  
I would happy if anyone can take a look at this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19373: [SPARK-22150][CORE] PeriodicCheckpointer fails in case o...

2017-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19373
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org