Thaks Stephan for the answer. As I told to Fabian we need to apply some transformation to datasets interactively. For the moment I will use livy + spark[1] but I'll prefer to stick with Flink if possible. So, if there's any effor in this direction just let me know and I'll be happy to contribute.
Best, Flavio [1] http://gethue.com/how-to-use-the-livy-spark-rest-job-server-for-interactive-spark-2-2/?cm_mc_uid=94745512088214727369424&cm_mc_sid_50200000=1476102041 On Mon, Oct 10, 2016 at 3:15 PM, Stephan Ewen <se...@apache.org> wrote: > There is still quite a bit needed to do this properly: > (1) incremental recovery > (2) network stack caching > > (1) will probably happen quite soon, I am not aware of any committer > having concrete plans for (2). > > Best, > Stephan > > > On Sat, Oct 8, 2016 at 4:41 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Any progress in this direction?how mich effort do you think it's required >> in order to implement this feature? >> >> On 2 Dec 2015 16:29, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: >> >>> Do you think it is possible to push ahead this thing? I need to >>> implement this interactive feature of Datasets. Do you think it is possible >>> to implement the persist() method in Flink (similar to Spark)? If you want >>> I can work on it with some instructions.. >>> >>> On Wed, Dec 2, 2015 at 3:05 PM, Maximilian Michels <m...@apache.org> >>> wrote: >>> >>>> Hi Flavio, >>>> >>>> I was working on this some time ago but it didn't make it in yet and >>>> priorities shifted a bit. The pull request is here: >>>> https://github.com/apache/flink/pull/640 >>>> >>>> The basic idea is to remove Flink's ResultPartition buffers in memory >>>> lazily, i.e. keep them as long as enough memory is available. When a >>>> new job is resumed, it picks up the old results again. The pull >>>> request needs some overhaul now and the API integration is not there >>>> yet. >>>> >>>> Cheers, >>>> Max >>>> >>>> On Mon, Nov 30, 2015 at 5:35 PM, Flavio Pompermaier >>>> <pomperma...@okkam.it> wrote: >>>> > I think that with some support I could try to implement it...actually >>>> I just >>>> > need to add a persist(StorageLevel.OFF_HEAP) method to the Dataset >>>> APIs >>>> > (similar to what Spark does..) and output it to a tachyon directory >>>> > configured in the flink-conf.yml and then re-read that dataset using >>>> its >>>> > generated name on tachyon. Do you have other suggestions? >>>> > >>>> > >>>> > On Mon, Nov 30, 2015 at 4:58 PM, Fabian Hueske <fhue...@gmail.com> >>>> wrote: >>>> >> >>>> >> The basic building blocks are there but I am not aware of any >>>> efforts to >>>> >> implement caching and add it to the API. >>>> >> >>>> >> 2015-11-30 16:55 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it >>>> >: >>>> >>> >>>> >>> Is there any effort in this direction? maybe I could achieve >>>> something >>>> >>> like that using Tachyon in some way...? >>>> >>> >>>> >>> On Mon, Nov 30, 2015 at 4:52 PM, Fabian Hueske <fhue...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> Hi Flavio, >>>> >>>> >>>> >>>> Flink does not support caching of data sets in memory yet. >>>> >>>> >>>> >>>> Best, Fabian >>>> >>>> >>>> >>>> 2015-11-30 16:45 GMT+01:00 Flavio Pompermaier < >>>> pomperma...@okkam.it>: >>>> >>>>> >>>> >>>>> Hi to all, >>>> >>>>> I was wondering if Flink could fit a use case where a user load a >>>> >>>>> dataset in memory and then he/she wants to explore it >>>> interactively. Let's >>>> >>>>> say I want to load a csv, then filter out the rows where the >>>> column value >>>> >>>>> match some criteria, then apply another criteria after seeing the >>>> results of >>>> >>>>> the first filter. >>>> >>>>> Is there a way to keep the dataset in memory and modify it >>>> >>>>> interactively without re-reading all the dataset every time I >>>> want to chain >>>> >>>>> another operation to my dataset? >>>> >>>>> >>>> >>>>> Best, >>>> >>>>> Flavio >>>> >>>> >>>> >>>> >>>> >>> >>>> >>> >>>> >> >>>> > >>>> > >>>> >>> >>> >