Re: Nested Iterations Outlook

2015-07-22 Thread Stephan Ewen
The two pull requests do not go all the way, unfortunately. They cover only
the runtime, the API integration part is missing still, unfortunately...

On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels m...@apache.org wrote:

 You could do that but you might run into merge conflicts. Also keep in
 mind that it is work in progress :)

 On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Thanks!

 Ok, cool. If I would like to test it, I just need to merge those two pull
 requests into my current branch?

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels m...@apache.org
 wrote:

 Now that makes more sense :) I thought by nested iterations you meant
 iterations in Flink that can be nested, i.e. starting an iteration inside
 an iteration.

 The caching/pinning of intermediate results is still a work in progress
 in Flink. It is actually in a state where it could be merged but some
 pending pull requests got delayed because priorities changed a bit.

 Essentially, we need to merge these two pull requests:

 https://github.com/apache/flink/pull/858
 This introduces a session management which allows to keep the
 ExecutionGraph for the session.

 https://github.com/apache/flink/pull/640
 Implements the actual backtracking and caching of the results.

 Once these are in, we can change the Java/Scala API to support
 backtracking. I don't exactly know how Spark's API does it but, essentially
 it should work then by just creating new operations on an existing DataSet
 and submit to the cluster again.

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Oh sorry, my fault. When I wrote it, I had iterations in mind.

 What I actually wanted to say, how resuming from intermediate
 results will work with (non-nested) non-Flink iterations? And with
 iterations I mean something like this:

 while(...):
   - change params
   - submit to cluster

 where the executed Flink-program is more or less the same at each
 iterations. But with changing input sets, which are reused between
 different loop iterations.

 I might got something wrong, because in our group we mentioned caching
 a lá Spark for Flink and someone came up that pinning will do that. Is
 that somewhat right?

 Thanks and Cheers,
 Max

 On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels m...@apache.org
 wrote:

  So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?

 Since there is no support for nested iterations that I know of, the
 debate how intermediate results are integrated has not started yet.


 Intermediate results are not produced within the iterations cycles.
 - Ok, if there are none, what does it have to do with that debate? :-)


 I was referring to the existing support for intermediate results
 within iterations. If we were to implement nested iterations, this could
 (possibly) change. This is all very theoretical because there are no plans
 to support nested iterations.

 Hope this clarifies. Otherwise, please restate your question because I
 might have misunderstood.

 Cheers,
 Max


 On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Thanks for the answer! But I need some clarification:

 So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?
 Intermediate results are not produced within the iterations cycles.
 - Ok, if there are none, what does it have to do with that debate? :-)

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels m...@apache.org
 wrote:

 Hi Max,

 You are right, there is no support for nested iterations yet. As far
 as I know, there are no concrete plans to add support for it. So it is 
 up
 to debate how the support for resuming from intermediate results will 
 look
 like. Intermediate results are not produced within the iterations 
 cycles.
 Same would be true for nested iterations. So the behavior for resuming 
 from
 intermediate results should be alike for nested iterations.

 Cheers,
 Max

 On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Flinksters,

 as far as I know, there is still no support for nested iterations
 planned. Am I right?

 So my question is how such use cases should be handled in the
 future.
 More specific: when pinning/caching will be available, you suggest
 to use that feature and program in Spark style? Or is there some 
 other,
 more flexible, mechanism planned for loops?

 Cheers,
 Max











Re: Nested Iterations Outlook

2015-07-22 Thread Maximilian Alber
Thanks.
Yes, I got that.

Cheers

On Wed, Jul 22, 2015 at 2:46 PM, Maximilian Michels m...@apache.org wrote:

 I mentioned that. @Max: you should only try it out if you want to
 experiment/work with the changes.

 On Wed, Jul 22, 2015 at 2:20 PM, Stephan Ewen se...@apache.org wrote:

 The two pull requests do not go all the way, unfortunately. They cover
 only the runtime, the API integration part is missing still,
 unfortunately...

 On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels m...@apache.org
 wrote:

 You could do that but you might run into merge conflicts. Also keep in
 mind that it is work in progress :)

 On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Thanks!

 Ok, cool. If I would like to test it, I just need to merge those two
 pull requests into my current branch?

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels m...@apache.org
 wrote:

 Now that makes more sense :) I thought by nested iterations you
 meant iterations in Flink that can be nested, i.e. starting an iteration
 inside an iteration.

 The caching/pinning of intermediate results is still a work in
 progress in Flink. It is actually in a state where it could be merged but
 some pending pull requests got delayed because priorities changed a bit.

 Essentially, we need to merge these two pull requests:

 https://github.com/apache/flink/pull/858
 This introduces a session management which allows to keep the
 ExecutionGraph for the session.

 https://github.com/apache/flink/pull/640
 Implements the actual backtracking and caching of the results.

 Once these are in, we can change the Java/Scala API to support
 backtracking. I don't exactly know how Spark's API does it but, 
 essentially
 it should work then by just creating new operations on an existing DataSet
 and submit to the cluster again.

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Oh sorry, my fault. When I wrote it, I had iterations in mind.

 What I actually wanted to say, how resuming from intermediate
 results will work with (non-nested) non-Flink iterations? And with
 iterations I mean something like this:

 while(...):
   - change params
   - submit to cluster

 where the executed Flink-program is more or less the same at each
 iterations. But with changing input sets, which are reused between
 different loop iterations.

 I might got something wrong, because in our group we mentioned
 caching a lá Spark for Flink and someone came up that pinning will do
 that. Is that somewhat right?

 Thanks and Cheers,
 Max

 On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels m...@apache.org
 wrote:

  So it is up to debate how the support for resuming from
 intermediate results will look like. - What's the current state of 
 that
 debate?

 Since there is no support for nested iterations that I know of, the
 debate how intermediate results are integrated has not started yet.


 Intermediate results are not produced within the iterations
 cycles. - Ok, if there are none, what does it have to do with that
 debate? :-)


 I was referring to the existing support for intermediate results
 within iterations. If we were to implement nested iterations, this could
 (possibly) change. This is all very theoretical because there are no 
 plans
 to support nested iterations.

 Hope this clarifies. Otherwise, please restate your question because
 I might have misunderstood.

 Cheers,
 Max


 On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Thanks for the answer! But I need some clarification:

 So it is up to debate how the support for resuming from
 intermediate results will look like. - What's the current state of 
 that
 debate?
 Intermediate results are not produced within the iterations
 cycles. - Ok, if there are none, what does it have to do with that
 debate? :-)

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels 
 m...@apache.org wrote:

 Hi Max,

 You are right, there is no support for nested iterations yet. As
 far as I know, there are no concrete plans to add support for it. So 
 it is
 up to debate how the support for resuming from intermediate results 
 will
 look like. Intermediate results are not produced within the iterations
 cycles. Same would be true for nested iterations. So the behavior for
 resuming from intermediate results should be alike for nested 
 iterations.

 Cheers,
 Max

 On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Flinksters,

 as far as I know, there is still no support for nested iterations
 planned. Am I right?

 So my question is how such use cases should be handled in the
 future.
 More specific: when pinning/caching will be available, you
 suggest to use that feature and program in Spark style? Or is 
 there some
 other, more flexible, mechanism planned for loops?

 Cheers,
 Max













Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Michels
 So it is up to debate how the support for resuming from intermediate
results will look like. - What's the current state of that debate?

Since there is no support for nested iterations that I know of, the debate
how intermediate results are integrated has not started yet.


 Intermediate results are not produced within the iterations cycles. -
 Ok, if there are none, what does it have to do with that debate? :-)


I was referring to the existing support for intermediate results within
iterations. If we were to implement nested iterations, this could
(possibly) change. This is all very theoretical because there are no plans
to support nested iterations.

Hope this clarifies. Otherwise, please restate your question because I
might have misunderstood.

Cheers,
Max

On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber 
alber.maximil...@gmail.com wrote:

 Thanks for the answer! But I need some clarification:

 So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?
 Intermediate results are not produced within the iterations cycles. -
 Ok, if there are none, what does it have to do with that debate? :-)

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels m...@apache.org
 wrote:

 Hi Max,

 You are right, there is no support for nested iterations yet. As far as I
 know, there are no concrete plans to add support for it. So it is up to
 debate how the support for resuming from intermediate results will look
 like. Intermediate results are not produced within the iterations cycles.
 Same would be true for nested iterations. So the behavior for resuming from
 intermediate results should be alike for nested iterations.

 Cheers,
 Max

 On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Flinksters,

 as far as I know, there is still no support for nested iterations
 planned. Am I right?

 So my question is how such use cases should be handled in the future.
 More specific: when pinning/caching will be available, you suggest to
 use that feature and program in Spark style? Or is there some other, more
 flexible, mechanism planned for loops?

 Cheers,
 Max






Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Alber
Thanks!

Ok, cool. If I would like to test it, I just need to merge those two pull
requests into my current branch?

Cheers,
Max

On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels m...@apache.org wrote:

 Now that makes more sense :) I thought by nested iterations you meant
 iterations in Flink that can be nested, i.e. starting an iteration inside
 an iteration.

 The caching/pinning of intermediate results is still a work in progress in
 Flink. It is actually in a state where it could be merged but some pending
 pull requests got delayed because priorities changed a bit.

 Essentially, we need to merge these two pull requests:

 https://github.com/apache/flink/pull/858
 This introduces a session management which allows to keep the
 ExecutionGraph for the session.

 https://github.com/apache/flink/pull/640
 Implements the actual backtracking and caching of the results.

 Once these are in, we can change the Java/Scala API to support
 backtracking. I don't exactly know how Spark's API does it but, essentially
 it should work then by just creating new operations on an existing DataSet
 and submit to the cluster again.

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Oh sorry, my fault. When I wrote it, I had iterations in mind.

 What I actually wanted to say, how resuming from intermediate results
 will work with (non-nested) non-Flink iterations? And with iterations I
 mean something like this:

 while(...):
   - change params
   - submit to cluster

 where the executed Flink-program is more or less the same at each
 iterations. But with changing input sets, which are reused between
 different loop iterations.

 I might got something wrong, because in our group we mentioned caching a
 lá Spark for Flink and someone came up that pinning will do that. Is that
 somewhat right?

 Thanks and Cheers,
 Max

 On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels m...@apache.org
 wrote:

  So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?

 Since there is no support for nested iterations that I know of, the
 debate how intermediate results are integrated has not started yet.


 Intermediate results are not produced within the iterations cycles.
 - Ok, if there are none, what does it have to do with that debate? :-)


 I was referring to the existing support for intermediate results within
 iterations. If we were to implement nested iterations, this could
 (possibly) change. This is all very theoretical because there are no plans
 to support nested iterations.

 Hope this clarifies. Otherwise, please restate your question because I
 might have misunderstood.

 Cheers,
 Max


 On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Thanks for the answer! But I need some clarification:

 So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?
 Intermediate results are not produced within the iterations cycles.
 - Ok, if there are none, what does it have to do with that debate? :-)

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels m...@apache.org
 wrote:

 Hi Max,

 You are right, there is no support for nested iterations yet. As far
 as I know, there are no concrete plans to add support for it. So it is up
 to debate how the support for resuming from intermediate results will look
 like. Intermediate results are not produced within the iterations cycles.
 Same would be true for nested iterations. So the behavior for resuming 
 from
 intermediate results should be alike for nested iterations.

 Cheers,
 Max

 On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Flinksters,

 as far as I know, there is still no support for nested iterations
 planned. Am I right?

 So my question is how such use cases should be handled in the future.
 More specific: when pinning/caching will be available, you suggest to
 use that feature and program in Spark style? Or is there some other, 
 more
 flexible, mechanism planned for loops?

 Cheers,
 Max









Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Michels
You could do that but you might run into merge conflicts. Also keep in mind
that it is work in progress :)

On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber 
alber.maximil...@gmail.com wrote:

 Thanks!

 Ok, cool. If I would like to test it, I just need to merge those two pull
 requests into my current branch?

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels m...@apache.org
 wrote:

 Now that makes more sense :) I thought by nested iterations you meant
 iterations in Flink that can be nested, i.e. starting an iteration inside
 an iteration.

 The caching/pinning of intermediate results is still a work in progress
 in Flink. It is actually in a state where it could be merged but some
 pending pull requests got delayed because priorities changed a bit.

 Essentially, we need to merge these two pull requests:

 https://github.com/apache/flink/pull/858
 This introduces a session management which allows to keep the
 ExecutionGraph for the session.

 https://github.com/apache/flink/pull/640
 Implements the actual backtracking and caching of the results.

 Once these are in, we can change the Java/Scala API to support
 backtracking. I don't exactly know how Spark's API does it but, essentially
 it should work then by just creating new operations on an existing DataSet
 and submit to the cluster again.

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Oh sorry, my fault. When I wrote it, I had iterations in mind.

 What I actually wanted to say, how resuming from intermediate results
 will work with (non-nested) non-Flink iterations? And with iterations I
 mean something like this:

 while(...):
   - change params
   - submit to cluster

 where the executed Flink-program is more or less the same at each
 iterations. But with changing input sets, which are reused between
 different loop iterations.

 I might got something wrong, because in our group we mentioned caching a
 lá Spark for Flink and someone came up that pinning will do that. Is that
 somewhat right?

 Thanks and Cheers,
 Max

 On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels m...@apache.org
 wrote:

  So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?

 Since there is no support for nested iterations that I know of, the
 debate how intermediate results are integrated has not started yet.


 Intermediate results are not produced within the iterations cycles.
 - Ok, if there are none, what does it have to do with that debate? :-)


 I was referring to the existing support for intermediate results within
 iterations. If we were to implement nested iterations, this could
 (possibly) change. This is all very theoretical because there are no plans
 to support nested iterations.

 Hope this clarifies. Otherwise, please restate your question because I
 might have misunderstood.

 Cheers,
 Max


 On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Thanks for the answer! But I need some clarification:

 So it is up to debate how the support for resuming from intermediate
 results will look like. - What's the current state of that debate?
 Intermediate results are not produced within the iterations cycles.
 - Ok, if there are none, what does it have to do with that debate? :-)

 Cheers,
 Max

 On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels m...@apache.org
 wrote:

 Hi Max,

 You are right, there is no support for nested iterations yet. As far
 as I know, there are no concrete plans to add support for it. So it is up
 to debate how the support for resuming from intermediate results will 
 look
 like. Intermediate results are not produced within the iterations cycles.
 Same would be true for nested iterations. So the behavior for resuming 
 from
 intermediate results should be alike for nested iterations.

 Cheers,
 Max

 On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber 
 alber.maximil...@gmail.com wrote:

 Hi Flinksters,

 as far as I know, there is still no support for nested iterations
 planned. Am I right?

 So my question is how such use cases should be handled in the future.
 More specific: when pinning/caching will be available, you suggest
 to use that feature and program in Spark style? Or is there some 
 other,
 more flexible, mechanism planned for loops?

 Cheers,
 Max










Nested Iterations Outlook

2015-07-17 Thread Maximilian Alber
Hi Flinksters,

as far as I know, there is still no support for nested iterations planned.
Am I right?

So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use
that feature and program in Spark style? Or is there some other, more
flexible, mechanism planned for loops?

Cheers,
Max