[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126695#comment-17126695 ] Jark Wu commented on FLINK-16497: - Btw, I will only apply the changes to the new JDBC connector, so it doesn't impact the old JDBC behavior. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126694#comment-17126694 ] Jark Wu commented on FLINK-16497: - Thank you all for the disucssion. I will create a pull request for this soon. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126642#comment-17126642 ] sunjincheng commented on FLINK-16497: - Thank you for active discussion!My original intention thinking is what is the most natural way to deal with stream computing. For now, sounds good to me for reduce the flush size to 100, and flush interval 1s as a comprehensive consideration. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126471#comment-17126471 ] Jark Wu commented on FLINK-16497: - It seems that more people prefer to have a trade off between out-of-box experience and production experience. I'm also fine with Danny's proposal. This default value will also work well in production. What do you think [~sunjincheng121]? > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126446#comment-17126446 ] lincoln lee commented on FLINK-16497: - +1 to default flush size (100) and flush interval(1s) I think this default configuration can be a good balance to both low latency and high throughput for most cases. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126395#comment-17126395 ] Benchao Li commented on FLINK-16497: +1 to Danny's proposal. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126378#comment-17126378 ] Jingsong Lee commented on FLINK-16497: -- +1 to default flush size (100) and flush interval(1s). I think we should find a trade-off between initial experience and production. I think 1s is good enough for user experience out-of-box. And these default values can also work for production. IIUC, elasticsearch also has a good default flush values for both user experience out-of-box and production. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126373#comment-17126373 ] Danny Chen commented on FLINK-16497: I think as a popular streaming engine, ensure good throughput and performance should be in the first class. Most of the client tools have a default flush strategy(either buffer size or interval)[1][2]. We should also follow that. I would suggest a default flush size (100) and flush interval(1s), it performs well for production and in local test 1s is also an acceptable latency. [1] https://kafka.apache.org/22/documentation.html#producerconfigs [2] https://github.com/searchbox-io/Jest > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123622#comment-17123622 ] Leonard Xu commented on FLINK-16497: Thanks [~sunjincheng121] and [~libenchao] 's detailed comments. I want to supply that the two parameters is just an out-of-box configuration and users may need to tune according to their specific scenarios. I'd like to suggest set *interval less than 1s* because we always proposed that FLIINK framework can offer sub-second latency, most user cases always send smaller data to db and will use other sink for big scale data. So, we can set *max-rows less than 10/50* as a basic tuning which can work with *interval* to reduce the DB's pressure efficiently. WDYT? > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120969#comment-17120969 ] Benchao Li commented on FLINK-16497: Hi [~sunjincheng121] , thanks for your detailed explanation, and it's really insightful. Let me try to express my opinion a little deeper. From my perspective, we always have those two settings when we encounters 'flush', and 'max-rows' controls the upper bound of each batch while 'flush-timeout' controls the lower bound of each batch. From this point, I think the case we are facing now belongs to the latter one. About the 1s latency for the testing case, I think maybe it's ok because Jdbc Sink is not the only one who brings latency to the user facing latency, the message queue (like Kafka), the network, and Flink Runtime all bring some latency. At least in my perspective, it's ok to have one more second in the sink to allow user to see the data in DB. And another point I want to address is, if we set the default 'max-rows' to 1, maybe it will force most of production scenarios to override this config, which maybe not that friendly to existing production use cases. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120927#comment-17120927 ] sunjincheng commented on FLINK-16497: - Hi [~libenchao] , Thanks for your reply! I think is is difficult for us to set an optimal default value from the perspective of performance, which is related to specific business and storage. So at the design level, for flow computing scenarios, real-time insertion (1 row) is a better semantic expression. So I think about it in terms of semantics and real-time. Even though the time of 1s is very short, beginners will still feel that it is real-time calculation, but not real-time write to storage, but mini-batch (1s may have multiple records), So, for now, I still prefer 1 row as default value. :) What do you think? > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120708#comment-17120708 ] Jark Wu commented on FLINK-16497: - [~lzljs3620320], what do you think about this? > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120696#comment-17120696 ] Benchao Li commented on FLINK-16497: [~sunjincheng121] Thanks for addressing this issue, I like the idea to improve user experience out-of-box. However I'm a little hesitate to change to 1 row by default. I prefer to change the default flush interval. The reason is if we change 1 row for flush size, then it will have a very low throughput for larger dataset. If we change the default flush interval, like 1s or 2s, then it will have a good performance for both very small dataset and larger dataset. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120549#comment-17120549 ] Leonard Xu commented on FLINK-16497: +1 for this. And I‘d like to open PR for master and 1.11.0 because it's not a feature just a default config value adjustment If you agree that we can fix in 1.11.0 [~jark] [~sunjincheng121]. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120532#comment-17120532 ] sunjincheng commented on FLINK-16497: - +1 for thi,s as I mentioned in FLINK-18041, and I prefer 1 row as default as it's pretty friendly for user testing. > Improve default flush strategy for JDBC sink to make it work out-of-box > --- > > Key: FLINK-16497 > URL: https://issues.apache.org/jira/browse/FLINK-16497 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Fix For: 1.11.0 > > > Currently, JDBC sink provides 2 flush options: > {code} > 'connector.write.flush.max-rows' = '5000', -- default is 5000 > 'connector.write.flush.interval' = '2s', -- no default value > {code} > That means if flush interval is not set, the buffered output rows may not be > flushed to database for a long time. That is a surprising behavior because no > results are outputed by default. > So I propose to have a default flush '1s' interval for JDBC sink or default 1 > row for flush size. -- This message was sent by Atlassian Jira (v8.3.4#803005)