Re: Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #2

2021-12-29 Thread Yun Gao
Ah indeed, very thanks Zhipeng for the check! I'll then cancel this candidate and initial the next one. Best, Yun --Original Mail -- Sender:Zhipeng Zhang Send Date:Thu Dec 30 12:02:46 2021 Recipients:dev CC:Yun Gao Subject:Re: [VOTE] Apache Flink ML

[jira] [Created] (FLINK-25484) TableRollingPolicy do not support inactivityInterval config which is supported in datastream api

2021-12-29 Thread LiChang (Jira)
LiChang created FLINK-25484: --- Summary: TableRollingPolicy do not support inactivityInterval config which is supported in datastream api Key: FLINK-25484 URL: https://issues.apache.org/jira/browse/FLINK-25484

[jira] [Created] (FLINK-25483) When FlinkSQL writes ES, it will not write and update the null value field

2021-12-29 Thread Jira
陈磊 created FLINK-25483: -- Summary: When FlinkSQL writes ES, it will not write and update the null value field Key: FLINK-25483 URL: https://issues.apache.org/jira/browse/FLINK-25483 Project: Flink

Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #2

2021-12-29 Thread Zhipeng Zhang
Hi Yun, Thanks for the release! I found that the NOTICE and license of `flink-ml-uber` is wrong since `flink-ml-uber` does not use `com.github.fommil.netlib:core:1.1.2` anymore. Rather, we are using `dev.ludovic.netlib:blas:2.2.0` in flink-ml-core. I have created a PR to remove the NOTICE and

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Lincoln Lee
Hi Jing, Thanks for your explanation! 1. For the hint name, +1 for WenLong's proposal. I think the `SHUFFLE` keyword is important in a classic distributed computing system, a hash-join usually means there's a shuffle stage(include shuffle hash-join, broadcast hash-join). Users only need to

Re: [DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-29 Thread Thomas Weise
Hi Jing, AFAIK most of the pain is caused by lack of backward compatibility (binary). And to make sure I'm not adding to the confusion: It would be necessary to be able to run the iceberg connector built against Flink 1.12 with a Flink 1.13 distribution. That would solve most problems downstream

[jira] [Created] (FLINK-25482) Hive Lookup Join with decimal type failed

2021-12-29 Thread miaojianlong (Jira)
miaojianlong created FLINK-25482: Summary: Hive Lookup Join with decimal type failed Key: FLINK-25482 URL: https://issues.apache.org/jira/browse/FLINK-25482 Project: Flink Issue Type: Bug

Re: [DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-29 Thread Jing Ge
Hi Piotrek, thanks for asking. To be honest, I hope it could be good enough if Flink could only provide backward compatibility, which is easier than providing forward compatibility described in the proposal. That is also one of the reasons why I started this discussion. If, after the discussion,

Re: [DISCUSS] FLIP-201: Persist local state in working directory

2021-12-29 Thread Till Rohrmann
I've created draft PR for the desired changes [1]. It might be easier to take a look at than the branch. [1] https://github.com/apache/flink/pull/18237 Cheers, Till On Tue, Dec 28, 2021 at 3:22 PM Till Rohrmann wrote: > Hi everyone, > > I would like to start a discussion about using the

[jira] [Created] (FLINK-25481) SourceIndex comparison in SplitEnumeratorContextProxy

2021-12-29 Thread Yuhao Bi (Jira)
Yuhao Bi created FLINK-25481: Summary: SourceIndex comparison in SplitEnumeratorContextProxy Key: FLINK-25481 URL: https://issues.apache.org/jira/browse/FLINK-25481 Project: Flink Issue Type:

[jira] [Created] (FLINK-25480) Create dashboard/monitoring to see resource usage per E2E test

2021-12-29 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-25480: -- Summary: Create dashboard/monitoring to see resource usage per E2E test Key: FLINK-25480 URL: https://issues.apache.org/jira/browse/FLINK-25480 Project: Flink

Re: [DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-29 Thread Piotr Nowojski
Hi Jink, I haven't yet fully reviewed the FLIP document, but I wanted to clarify something. > Flink Forward Compatibility > Based on the previous clarification, Flink forward compatibility should mean that Flink jobs or ecosystems like external connectors/formats built with newer > Flink version

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Martijn Visser
Hi Jing, Thanks for explaining this in more detail and also to others participating. > I think using query hints in this case is more natural for users, WDYT? Yes, I agree. As long as we properly explain in our documentation that we support both Query Hints and Table Hints, what's the

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Jing Zhang
Hi Jian gang, Thanks for the feedback. > When it comes to hive, how do you load partial data instead of the whole data? Any change related with hive? The question is same as Yuan mentioned before. I prefer to drive another FLIP on this topic to further discussion individually because this

[jira] [Created] (FLINK-25479) Changlog materialization with incremental checkpoint cannot work well in local tests

2021-12-29 Thread Yun Tang (Jira)
Yun Tang created FLINK-25479: Summary: Changlog materialization with incremental checkpoint cannot work well in local tests Key: FLINK-25479 URL: https://issues.apache.org/jira/browse/FLINK-25479

[DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-29 Thread Jing Ge
Hi everyone, with great interest I have read all discussions [1][2][3] w.r.t. the (API?) compatibility issues. The feedback coming from the Flink user's point of view is very valuable. Many thanks for it. In these discussions, there were many explanations that talked about backward and forward

[jira] [Created] (FLINK-25478) Changelog materialization with incremental checkpoint could cause checkpointed data lost

2021-12-29 Thread Yun Tang (Jira)
Yun Tang created FLINK-25478: Summary: Changelog materialization with incremental checkpoint could cause checkpointed data lost Key: FLINK-25478 URL: https://issues.apache.org/jira/browse/FLINK-25478

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread Jing Zhang
Hi Wenlong, Thanks for the feedback. I've checked similar syntax in other systems, they are all different from each other. It seems to be without consensus. As mentioned in FLIP-204, oracle uses a query hint, the hint name is 'use_hash' [1]. Spark also uses a query hint, its name is

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread 刘建刚
Thank you for the proposal, Jing. I like the idea to partition data by some key to improve the cache hit. I have some questions: 1. When it comes to hive, how do you load partial data instead of the whole data? Any change related with hive? 2. How to define the cache configuration? For

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2021-12-29 Thread Xuannan Su
Hi David, Thanks for sharing your thoughts. You are right that most people tend to use high-level API for interactive data exploration. Actually, there is the FLIP-36 [1] covering the cache API at Table/SQL API. As far as I know, it has been accepted but hasn’t been implemented. At the time when

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2021-12-29 Thread David Morávek
Hi Xuannan, thanks for drafting this FLIP. One immediate thought, from what I've seen for interactive data exploration with Spark, most people tend to use the higher level APIs, that allow for faster prototyping (Table API in Flink's case). Should the Table API also be covered by this FLIP?

[DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2021-12-29 Thread Xuannan Su
Hi devs, I’d like to start a discussion about adding support to cache the intermediate result at DataStream API for batch processing. As the DataStream API now supports batch execution mode, we see users using the DataStream API to run batch jobs. Interactive programming is an important use case

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread wenlong.lwl
Hi, Jing, thanks for driving the discussion. Have you made some investigation on the syntax of join hint? Why do you choose USE_HASH from oracle instead of the style of spark SHUFFLE_HASH, they are quite different. People in the big data world may be more familiar with spark/hive, if we need to

[jira] [Created] (FLINK-25477) The directory structure of the State Backends document is not standardized

2021-12-29 Thread Hangxiang Yu (Jira)
Hangxiang Yu created FLINK-25477: Summary: The directory structure of the State Backends document is not standardized Key: FLINK-25477 URL: https://issues.apache.org/jira/browse/FLINK-25477 Project:

Re:Re: [DISCUSS] Introduce Hash Lookup Join

2021-12-29 Thread zst...@163.com
Hi Jing, Thanks for your detail reply. 1) In the last suggestion, hash by primary key is not use for raising the cache hit, but handling with skew of left source. Now that you have 'skew' hint and other discussion about it, I'm looking forward to it. 2) I mean to support user defined

[jira] [Created] (FLINK-25476) CharType lost in the creation of MaxAggFunction & MinAggFunction

2021-12-29 Thread zoucao (Jira)
zoucao created FLINK-25476: -- Summary: CharType lost in the creation of MaxAggFunction & MinAggFunction Key: FLINK-25476 URL: https://issues.apache.org/jira/browse/FLINK-25476 Project: Flink Issue

[jira] [Created] (FLINK-25475) When windowAgg and groupAgg are included at the same time, there is no assigner generated but MiniBatch optimization is still used.

2021-12-29 Thread ChangjiGuo (Jira)
ChangjiGuo created FLINK-25475: -- Summary: When windowAgg and groupAgg are included at the same time, there is no assigner generated but MiniBatch optimization is still used. Key: FLINK-25475 URL: