Re: Re: Bootstrapping the state

2018-07-23 Thread Henri Heiskanen
y plan about this, I would try to submit this idea > to the community. > > And about "how to bootstrap a state", what does that mean? can you explain > this? > > Thank, vino > > > On 2018-07-20 20:00 , Henri Heiskanen Wrote: > > Hi, > > Thanks. Just to

Re: Bootstrapping the state

2018-07-20 Thread Henri Heiskanen
ent SourceFunction interface). > > For your requirement, you can check a no more data idle time, if expire, > then exit, finally the job will stop. > > You can also refer the implementation of other source connectors. > > Thanks, vino. > > 2018-07-19 19:52 GMT+08:00 Henri

Bootstrapping the state

2018-07-19 Thread Henri Heiskanen
Hi, I've been looking into how to initialise large state and especially checked this presentation by Lyft referenced in this group as well: https://www.youtube.com/watch?v=WdMcyN5QZZQ In our use case we would like to load roughly 4 billion entries into this state and I believe loading this data

Re: deduplication with streaming sql

2018-02-06 Thread Henri Heiskanen
lem is that SELECT DISTINCT is not officially supported nor >> tested. I opened an issue for this [1]. >> >> Until this issue is fixed I would recommend to implement a custom >> aggregate function that keeps track values seen so far [2]. >> >> Regards, >

deduplication with streaming sql

2018-02-06 Thread Henri Heiskanen
Hi, I have a use case where I would like to find distinct rows over certain period of time. Requirement is that new row is emitted asap. Otherwise the requirement is mainly to just filter out data to have smaller dataset for downstream. I noticed that SELECT DISTINCT and state retention time of

starting query server when running flink embedded

2017-09-28 Thread Henri Heiskanen
Hi, I would like to test queryable state just by running the flink embedded from my IDE. What is the easiest way to start it properly? If I run the below I can not see the query server listening at the given port. I found something about this, but it was about copying some base classes and post

Re: difference between checkpoints & savepoints

2017-08-10 Thread Henri Heiskanen
uture, because you can > optimize checkpoints in some cases if this is feature dropped. > > Right now, externalized checkpoints should offer all that you want. > > Best, > Stefan > > Am 10.08.2017 um 11:46 schrieb Henri Heiskanen <henri.heiska...@gmail.com > >

Re: difference between checkpoints & savepoints

2017-08-10 Thread Henri Heiskanen
Hi, It would be super helpful if Flink would provide out of the box functionality for writing automatic savepoints and then starting from the latest savepoint. If external checkpoints would support rescaling then 1st requirement is met, but one would still need to e.g. find the latest checkpoint

Re: Flink streaming questions

2017-01-09 Thread Henri Heiskanen
ci.apache.org/projects/flink/flink-docs- > release-1.2/dev/windows.html#windowfunction-with-incremental-aggregation > > 2017-01-03 12:32 GMT+01:00 Henri Heiskanen <henri.heiska...@gmail.com>: > >> Hi, >> >> Actually it seems "Fold cannot be used with a merg

Re: Are heterogeneous DataStreams possible?

2017-01-09 Thread Henri Heiskanen
Hi, We have been using HashMap and has been working fine so far. Br, Henkka On Mon, Jan 9, 2017 at 5:35 PM, Aljoscha Krettek wrote: > You could try using JSON for all your data, this might me slow, however. > The other route, which I would suggest, is to have your own

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Henri Heiskanen
Hi, We had the same problem when running 0.9 consumer against 0.10 Kafka. Upgrading Flink Kafka connector to 0.10 fixed our issue. Br, Henkka On Mon, Jan 9, 2017 at 5:39 PM, Tzu-Li (Gordon) Tai wrote: > Hi, > > Not sure what might be going on here. I’m pretty certain that

Re: Flink streaming questions

2017-01-03 Thread Henri Heiskanen
: T = { >> if(!initialized){ >> doInitStuff() >> initialized = true >> } >> >> doNormalStuff() >> } >> } > > > #3 - One way to do this is as you've said which is to attach the profile > information to the event, us

Flink streaming questions

2017-01-02 Thread Henri Heiskanen
Hi, I have few questions related to Flink streaming. I am on 1.2-SNAPSHOT and what I would like to accomplish is to have a stream that reads data from multiple kafka topics, identifies user sessions, uses an external user user profile to enrich the data, evaluates an script to produce session