Re: Nested Iterations Outlook

Maximilian Michels Wed, 22 Jul 2015 05:47:36 -0700

I mentioned that. @Max: you should only try it out if you want to
experiment/work with the changes.


On Wed, Jul 22, 2015 at 2:20 PM, Stephan Ewen <se...@apache.org> wrote:

> The two pull requests do not go all the way, unfortunately. They cover
> only the runtime, the API integration part is missing still,
> unfortunately...
>
> On Mon, Jul 20, 2015 at 5:53 PM, Maximilian Michels <m...@apache.org>
> wrote:
>
>> You could do that but you might run into merge conflicts. Also keep in
>> mind that it is work in progress :)
>>
>> On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber <
>> alber.maximil...@gmail.com> wrote:
>>
>>> Thanks!
>>>
>>> Ok, cool. If I would like to test it, I just need to merge those two
>>> pull requests into my current branch?
>>>
>>> Cheers,
>>> Max
>>>
>>> On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels <m...@apache.org>
>>> wrote:
>>>
>>>> Now that makes more sense :) I thought by "nested iterations" you meant
>>>> iterations in Flink that can be nested, i.e. starting an iteration inside
>>>> an iteration.
>>>>
>>>> The caching/pinning of intermediate results is still a work in progress
>>>> in Flink. It is actually in a state where it could be merged but some
>>>> pending pull requests got delayed because priorities changed a bit.
>>>>
>>>> Essentially, we need to merge these two pull requests:
>>>>
>>>> https://github.com/apache/flink/pull/858
>>>> This introduces a session management which allows to keep the
>>>> ExecutionGraph for the session.
>>>>
>>>> https://github.com/apache/flink/pull/640
>>>> Implements the actual backtracking and caching of the results.
>>>>
>>>> Once these are in, we can change the Java/Scala API to support
>>>> backtracking. I don't exactly know how Spark's API does it but, essentially
>>>> it should work then by just creating new operations on an existing DataSet
>>>> and submit to the cluster again.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Mon, Jul 20, 2015 at 3:31 PM, Maximilian Alber <
>>>> alber.maximil...@gmail.com> wrote:
>>>>
>>>>> Oh sorry, my fault. When I wrote it, I had iterations in mind.
>>>>>
>>>>> What I actually wanted to say, how "resuming from intermediate
>>>>> results" will work with (non-nested) "non-Flink" iterations? And with
>>>>> iterations I mean something like this:
>>>>>
>>>>> while(...):
>>>>>   - change params
>>>>>   - submit to cluster
>>>>>
>>>>> where the executed Flink-program is more or less the same at each
>>>>> iterations. But with changing input sets, which are reused between
>>>>> different loop iterations.
>>>>>
>>>>> I might got something wrong, because in our group we mentioned caching
>>>>> a lá Spark for Flink and someone came up that "pinning" will do that. Is
>>>>> that somewhat right?
>>>>>
>>>>> Thanks and Cheers,
>>>>> Max
>>>>>
>>>>> On Mon, Jul 20, 2015 at 1:06 PM, Maximilian Michels <m...@apache.org>
>>>>> wrote:
>>>>>
>>>>>>  "So it is up to debate how the support for resuming from
>>>>>> intermediate results will look like." -> What's the current state of that
>>>>>> debate?
>>>>>>
>>>>>> Since there is no support for nested iterations that I know of, the
>>>>>> debate how intermediate results are integrated has not started yet.
>>>>>>
>>>>>>
>>>>>>> "Intermediate results are not produced within the iterations
>>>>>>> cycles." -> Ok, if there are none, what does it have to do with that
>>>>>>> debate? :-)
>>>>>>>
>>>>>>
>>>>>> I was referring to the existing support for intermediate results
>>>>>> within iterations. If we were to implement nested iterations, this could
>>>>>> (possibly) change. This is all very theoretical because there are no 
>>>>>> plans
>>>>>> to support nested iterations.
>>>>>>
>>>>>> Hope this clarifies. Otherwise, please restate your question because
>>>>>> I might have misunderstood.
>>>>>>
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 20, 2015 at 12:11 PM, Maximilian Alber <
>>>>>> alber.maximil...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for the answer! But I need some clarification:
>>>>>>>
>>>>>>> "So it is up to debate how the support for resuming from
>>>>>>> intermediate results will look like." -> What's the current state of 
>>>>>>> that
>>>>>>> debate?
>>>>>>> "Intermediate results are not produced within the iterations
>>>>>>> cycles." -> Ok, if there are none, what does it have to do with that
>>>>>>> debate? :-)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Max
>>>>>>>
>>>>>>> On Mon, Jul 20, 2015 at 10:50 AM, Maximilian Michels <m...@apache.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Max,
>>>>>>>>
>>>>>>>> You are right, there is no support for nested iterations yet. As
>>>>>>>> far as I know, there are no concrete plans to add support for it. So 
>>>>>>>> it is
>>>>>>>> up to debate how the support for resuming from intermediate results 
>>>>>>>> will
>>>>>>>> look like. Intermediate results are not produced within the iterations
>>>>>>>> cycles. Same would be true for nested iterations. So the behavior for
>>>>>>>> resuming from intermediate results should be alike for nested 
>>>>>>>> iterations.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Max
>>>>>>>>
>>>>>>>> On Fri, Jul 17, 2015 at 4:26 PM, Maximilian Alber <
>>>>>>>> alber.maximil...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Flinksters,
>>>>>>>>>
>>>>>>>>> as far as I know, there is still no support for nested iterations
>>>>>>>>> planned. Am I right?
>>>>>>>>>
>>>>>>>>> So my question is how such use cases should be handled in the
>>>>>>>>> future.
>>>>>>>>> More specific: when pinning/caching will be available, you suggest
>>>>>>>>> to use that feature and program in "Spark" style? Or is there some 
>>>>>>>>> other,
>>>>>>>>> more flexible, mechanism planned for loops?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Nested Iterations Outlook

Reply via email to