Re: Flink SQL State

2023-04-26 Thread Giannis Polyzos
This is really helpful, Thanks On Thu, Apr 27, 2023 at 5:46 AM Yanfei Lei wrote: > Hi Giannis, > > Except “default” Colume Family(CF), all other CFs represent the state > in rocksdb state backend, the name of a CF is the name of a > StateDescriptor. > > - deduplicate-state is a value state,

Re: how to configure window of join operator in batch mode

2023-04-26 Thread Shammon FY
Hi Jiadong, >From the context you described, I think ProcessingTimeWindow may not be a good solution. If I understand correctly, you'd like to use the same SQL for streaming and batch jobs in your platform. How about creating partitioned Sink tables for streaming jobs instead of Window? Then the

Re: flink batch execution mode

2023-04-26 Thread weijie guo
Hi Lu, At present, Flink is still based on row format in the runtime execution layer and vectorization relies mostly on automatic compiler optimizations of the JVM. Best regards, Weijie Shammon FY 于2023年4月27日周四 10:52写道: > Hi Lu, > > Currently, Flink does not have official benchmark results

Re: flink batch execution mode

2023-04-26 Thread Lu Niu
Thanks! Shammon! Do you also have insights on the first 2 questions as well? Thanks! 1. Does flink batch mode use columnar in-memory format? 2. Does flink batch mode use vectorization technique? Best Lu On Wed, Apr 26, 2023 at 7:51 PM Shammon FY wrote: > Hi Lu, > > Currently, Flink does not

Re: Python Datastream: CountTumblingWindowAssigner never purges?

2023-04-26 Thread Dian Fu
Filed ticket https://issues.apache.org/jira/browse/FLINK-31949 to track this issue. On Thu, Apr 27, 2023 at 11:14 AM Dian Fu wrote: > Hi Urs, > > I guess you are right. This seems like a bug which should be addressed. > > Regards, > Dian > > On Mon, Apr 24, 2023 at 5:07 AM Urs Schönenberger < >

Re: Python Datastream: CountTumblingWindowAssigner never purges?

2023-04-26 Thread Dian Fu
Hi Urs, I guess you are right. This seems like a bug which should be addressed. Regards, Dian On Mon, Apr 24, 2023 at 5:07 AM Urs Schönenberger < urs.schoenenber...@tngtech.com> wrote: > Hi all, > > In FLINK-26444, a couple of convenience window assigners were added to > the Python Datastream

Re: flink batch execution mode

2023-04-26 Thread Shammon FY
Hi Lu, Currently, Flink does not have official benchmark results compared to Spark and Presto. You can run the TPC-DS benchmark yourself to compare to different engines, Flink supports all queries for TPC-DS, for example, the benchmark project [1]. Of course, other companies have made similar

Re: Flink SQL State

2023-04-26 Thread Yanfei Lei
Hi Giannis, Except “default” Colume Family(CF), all other CFs represent the state in rocksdb state backend, the name of a CF is the name of a StateDescriptor. - deduplicate-state is a value state, you can find it in DeduplicateFunctionBase.java and MiniBatchDeduplicateFunctionBase.java, they are

flink batch execution mode

2023-04-26 Thread Lu Niu
Hi, Flink users I am trying to understand the internals of flink batch mode. some questions: 1. Does flink batch mode use columnar in-memory format? 2. Does flink batch mode use vectorization technique? 3. any performance benchmark available compared with batch engines like spark or presto?

Re: Can I setup standby taskmanagers while using reactive mode?

2023-04-26 Thread Gyula Fóra
I think the behaviour is going to get a little weird because this would actually defeat the purpose of the standby TM. MAX - some offset will decrease once you lose a TM so in this case we would scale down to again have a spare (which we never actually use.) Gyula On Wed, Apr 26, 2023 at 4:02 PM

Re: Can I setup standby taskmanagers while using reactive mode?

2023-04-26 Thread Chesnay Schepler
Reactive mode doesn't support standby taskmanagers. As you said it always uses all available resources in the cluster. I can see it being useful though to not always scale to MAX but (MAX - some_offset). I'd suggest to file a ticket. On 26/04/2023 00:17, Wei Hou via user wrote: Hi Flink

Re: how to configure window of join operator in batch mode

2023-04-26 Thread Jiadong Lu
Hi Shanmmon, Thank you for your quick response. To give you some context, I am working on a project that involves joining two streams and performing some left/inner join operations based on certain keys. As for using batch mode, my intention is to have a unified approach that works for both

Re: how to configure window of join operator in batch mode

2023-04-26 Thread Shammon FY
Hi Jiadong Using the process time window in Batch jobs may be a little strange for me. I prefer to partition the data according to the day level, and then the Batch job reads data from different partitions instead of using Window. Best, Shammon FY On Wed, Apr 26, 2023 at 12:03 PM Jiadong Lu

Re: State bootstrapping for Flink SQL / Table API jobs

2023-04-26 Thread Flavio Pompermaier
This feature would be an awesome addition! I'm looking forward to it On Mon, Apr 24, 2023 at 3:59 PM Илья Соин wrote: > Thank you, Shammon FY > > -- > *Sincerely,* > *Ilya Soin* > > On 24 Apr 2023, at 15:19, Shammon FY wrote: > >  > Thanks Илья, there's already a FLIP [1] and discussion