Thaks Stephan for the answer.
As I told to Fabian we need to apply some transformation to datasets
interactively.
For the moment I will use livy + spark[1] but I'll prefer to stick with
Flink if possible.
So, if there's any effor in this direction just let me know and I'll be
happy to contribute.

Best,
Flavio

[1]
http://gethue.com/how-to-use-the-livy-spark-rest-job-server-for-interactive-spark-2-2/?cm_mc_uid=94745512088214727369424&cm_mc_sid_50200000=1476102041

On Mon, Oct 10, 2016 at 3:15 PM, Stephan Ewen <se...@apache.org> wrote:

> There is still quite a bit needed to do this properly:
>   (1) incremental recovery
>   (2) network stack caching
>
> (1) will probably happen quite soon, I am not aware of any committer
> having concrete plans for (2).
>
> Best,
> Stephan
>
>
> On Sat, Oct 8, 2016 at 4:41 PM, Flavio Pompermaier <pomperma...@okkam.it>
> wrote:
>
>> Any progress in this direction?how mich effort do you think it's required
>> in order to implement this feature?
>>
>> On 2 Dec 2015 16:29, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
>>
>>> Do you think it is possible to push ahead this thing? I need to
>>> implement this interactive feature of Datasets. Do you think it is possible
>>> to implement the persist() method in Flink (similar to Spark)? If you want
>>> I can work on it with some instructions..
>>>
>>> On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <m...@apache.org>
>>> wrote:
>>>
>>>> Hi Flavio,
>>>>
>>>> I was working on this some time ago but it didn't make it in yet and
>>>> priorities shifted a bit. The pull request is here:
>>>> https://github.com/apache/flink/pull/640
>>>>
>>>> The basic idea is to remove Flink's ResultPartition buffers in memory
>>>> lazily, i.e. keep them as long as enough memory is available. When a
>>>> new job is resumed, it picks up the old results again. The pull
>>>> request needs some overhaul now and the API integration is not there
>>>> yet.
>>>>
>>>> Cheers,
>>>> Max
>>>>
>>>> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier
>>>> <pomperma...@okkam.it> wrote:
>>>> > I think that with some support I could try to implement it...actually
>>>> I just
>>>> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset
>>>> APIs
>>>> > (similar to what Spark does..) and output it to a tachyon directory
>>>> > configured in the flink-conf.yml and then re-read that dataset using
>>>> its
>>>> > generated name on tachyon. Do you have other suggestions?
>>>> >
>>>> >
>>>> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> The basic building blocks are there but I am not aware of any
>>>> efforts to
>>>> >> implement caching and add it to the API.
>>>> >>
>>>> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it
>>>> >:
>>>> >>>
>>>> >>> Is there any effort in this direction? maybe I could achieve
>>>> something
>>>> >>> like that using Tachyon in some way...?
>>>> >>>
>>>> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhue...@gmail.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hi Flavio,
>>>> >>>>
>>>> >>>> Flink does not support caching of data sets in memory yet.
>>>> >>>>
>>>> >>>> Best, Fabian
>>>> >>>>
>>>> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier <
>>>> pomperma...@okkam.it>:
>>>> >>>>>
>>>> >>>>> Hi to all,
>>>> >>>>> I was wondering if Flink could fit a use case where a user load a
>>>> >>>>> dataset in memory and then he/she wants to explore it
>>>> interactively. Let's
>>>> >>>>> say I want to load a csv, then filter out the rows where the
>>>> column value
>>>> >>>>> match some criteria, then apply another criteria after seeing the
>>>> results of
>>>> >>>>> the first filter.
>>>> >>>>> Is there a way to keep the dataset in memory and modify it
>>>> >>>>> interactively without re-reading all the dataset every time I
>>>> want to chain
>>>> >>>>> another operation to my dataset?
>>>> >>>>>
>>>> >>>>> Best,
>>>> >>>>> Flavio
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>> >
>>>>
>>>
>>>
>

Reply via email to