Re: where does flink store the intermediate results of a join and what is the key?

2020-01-28 Thread Arvid Heise
Yes, the default is writing to an external system. Especially if you want SQL, then there is currently no other way around it. The drawbacks of writing to external systems are: additional maintenance of another system and higher latency. On Tue, Jan 28, 2020 at 11:49 AM kant kodali wrote: > Hi

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-28 Thread kant kodali
Hi Arvid, I am trying to understand your statement. I am new to Flink so excuse me if I don't know something I should have known. ProcessFunction just process the records right? If so, how is it better than writing to an external system? At the end of the day I want to be able to query it

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-28 Thread Arvid Heise
Hi Kant, just wanted to mention the obvious. If you add a ProcessFunction right after the join, you could maintain a user state with the same result. That will of course blow up the data volume by a factor of 2, but may still be better than writing to an external system. On Mon, Jan 27, 2020 at

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-27 Thread Benoît Paris
Dang what a massive PR: Files changed2,118, +104,104 −29,161 lines changed. Thanks for the details, Jark! On Mon, Jan 27, 2020 at 4:07 PM Jark Wu wrote: > Hi Kant, > Having a custom state backend is very difficult and is not recommended. > > Hi Benoît, > Yes, the "Query on the intermediate

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-27 Thread Jark Wu
Hi Kant, Having a custom state backend is very difficult and is not recommended. Hi Benoît, Yes, the "Query on the intermediate state is on the roadmap" I mentioned is referring to integrate Table API & SQL with Queryable State. We also have an early issue FLINK-6968 to tracks this. Best, Jark

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-23 Thread Benoît Paris
Hi all! @Jark, out of curiosity, would you be so kind as to expand a bit on "Query on the intermediate state is on the roadmap"? Are you referring to working on QueryableStateStream/QueryableStateClient [1], or around "FOR SYSTEM_TIME AS OF" [2], or on other APIs/concepts (is there a FLIP?)?

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-22 Thread kant kodali
Is it a common practice to have a custom state backend? if so, what would be a popular custom backend? Can I do Elasticseatch as a state backend? Thanks! On Wed, Jan 22, 2020 at 1:42 AM Jark Wu wrote: > Hi Kant, > > 1) List of row is also sufficient in this case. Using a MapState is in >

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-22 Thread Jark Wu
Hi Kant, 1) List of row is also sufficient in this case. Using a MapState is in order to retract a row faster, and save the storage size. 2) State Process API is usually used to process save point. I’m afraid the performance is not good to use it for querying. On the other side, AFAIK, State

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-21 Thread kant kodali
Hi Jark, 1) shouldn't it be a col1 to List of row? multiple rows can have the same joining key right? 2) Can I use state processor API from an external application to query the intermediate results in near real-time? I

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-21 Thread Jark Wu
Hi Kant, 1) Yes, it will be stored in rocksdb statebackend. 2) In old planner, the left state is the same with right state which are both `>>`. It is a 2-level map structure, where the `col1` is the join key, it is the first-level key of the state. The key of the MapState is the input row,