[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-05 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126695#comment-17126695
 ] 

Jark Wu commented on FLINK-16497:
-

Btw, I will only apply the changes to the new JDBC connector, so it doesn't 
impact the old JDBC behavior. 

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-05 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126694#comment-17126694
 ] 

Jark Wu commented on FLINK-16497:
-

Thank you all for the disucssion. I will create a pull request for this soon. 

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-05 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126642#comment-17126642
 ] 

sunjincheng commented on FLINK-16497:
-

Thank you for active discussion!My original intention thinking is what is the 
most natural way to deal with stream computing. For now, sounds good to me for 
reduce the flush size to 100, and flush interval 1s as a comprehensive 
consideration.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-05 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126471#comment-17126471
 ] 

Jark Wu commented on FLINK-16497:
-

It seems that more people prefer to have a trade off between out-of-box 
experience and production experience. 
I'm also fine with Danny's proposal. This default value will also work well in 
production. What do you think [~sunjincheng121]? 

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-05 Thread lincoln lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126446#comment-17126446
 ] 

lincoln lee commented on FLINK-16497:
-

+1 to default flush size (100) and flush interval(1s)

I think this default configuration can be a good balance to both low latency 
and high throughput for most cases.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-04 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126395#comment-17126395
 ] 

Benchao Li commented on FLINK-16497:


+1 to Danny's proposal.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-04 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126378#comment-17126378
 ] 

Jingsong Lee commented on FLINK-16497:
--

+1 to default flush size (100) and flush interval(1s). 

I think we should find a trade-off between initial experience and production.

I think 1s is good enough for user experience out-of-box. And these default 
values can also work for production.

IIUC, elasticsearch also has a good default flush values for both user 
experience out-of-box and production.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-04 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126373#comment-17126373
 ] 

Danny Chen commented on FLINK-16497:


I think as a popular streaming engine, ensure good throughput and performance 
should be in the first class. Most of the client tools have a default flush 
strategy(either buffer size or interval)[1][2]. We should also follow that.

I would suggest a default flush size (100) and flush interval(1s), it performs 
well for production and in local test 1s is also an acceptable latency.

[1] https://kafka.apache.org/22/documentation.html#producerconfigs
[2] https://github.com/searchbox-io/Jest

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-02 Thread Leonard Xu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123622#comment-17123622
 ] 

Leonard Xu commented on FLINK-16497:


Thanks [~sunjincheng121] and [~libenchao] 's detailed comments.

I want to supply that the two parameters is just an out-of-box configuration 
and users may need to tune according to their specific scenarios.

I'd like to suggest set *interval less than 1s* because we always proposed that 
FLIINK framework can offer sub-second latency, most user cases always send 
smaller data to db and will use other sink for big scale data. So, we can set 
*max-rows less than 10/50* as a basic tuning which can work with *interval* to 
reduce the DB's pressure efficiently.

WDYT?

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Critical
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-01 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120969#comment-17120969
 ] 

Benchao Li commented on FLINK-16497:


Hi [~sunjincheng121] , thanks for your detailed explanation, and it's really 
insightful.
Let me try to express my opinion a little deeper. From my perspective, we 
always have those two settings when we encounters 'flush', and 'max-rows' 
controls the upper bound of each batch while 'flush-timeout' controls the lower 
bound of each batch. From this point, I think the case we are facing now 
belongs to the latter one.
About the 1s latency for the testing case, I think maybe it's ok because Jdbc 
Sink is not the only one who brings latency to the user facing latency, the 
message queue (like Kafka), the network, and Flink Runtime all bring some 
latency. At least in my perspective, it's ok to have one more second in the 
sink to allow user to see the data in DB.
And another point I want to address is, if we set the default 'max-rows' to 1, 
maybe it will force most of production scenarios to override this config, which 
maybe not that friendly to existing production use cases.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-06-01 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120927#comment-17120927
 ] 

sunjincheng commented on FLINK-16497:
-

Hi [~libenchao] , Thanks for your reply!

I think is is difficult for us to set an optimal default value from the 
perspective of performance, which is related to specific business and storage. 
So at the design level, for flow computing scenarios, real-time insertion (1 
row) is a better semantic expression. So I think about it in terms of semantics 
and real-time. Even though the time of 1s is very short, beginners will still 
feel that it is real-time calculation, but not real-time write to storage, but 
mini-batch (1s may have multiple records), So, for now, I still prefer 1 row as 
default value. :)

What do you think?

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-05-31 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120708#comment-17120708
 ] 

Jark Wu commented on FLINK-16497:
-

[~lzljs3620320], what do you think about this?

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-05-31 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120696#comment-17120696
 ] 

Benchao Li commented on FLINK-16497:


[~sunjincheng121] Thanks for addressing this issue, I like the idea to improve 
user experience out-of-box. However I'm a little hesitate to change to 1 row by 
default. I prefer to change the default flush interval.

The reason is if we change 1 row for flush size, then it will have a very low 
throughput for larger dataset. If we change the default flush interval, like 1s 
or 2s, then it will have a good performance for both very small dataset and 
larger dataset.

 

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-05-31 Thread Leonard Xu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120549#comment-17120549
 ] 

Leonard Xu commented on FLINK-16497:


+1 for this.

And I‘d like to open PR for master and 1.11.0 because it's not a feature just a 
default config value adjustment If you agree that we can fix in 1.11.0 [~jark]  
[~sunjincheng121].

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16497) Improve default flush strategy for JDBC sink to make it work out-of-box

2020-05-31 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120532#comment-17120532
 ] 

sunjincheng commented on FLINK-16497:
-

+1 for thi,s as I mentioned in FLINK-18041, and I prefer 1 row as default as 
it's pretty friendly for user testing.

> Improve default flush strategy for JDBC sink to make it work out-of-box
> ---
>
> Key: FLINK-16497
> URL: https://issues.apache.org/jira/browse/FLINK-16497
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC, Table SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, JDBC sink provides 2 flush options:
> {code}
> 'connector.write.flush.max-rows' = '5000', -- default is 5000
> 'connector.write.flush.interval' = '2s', -- no default value
> {code}
> That means if flush interval is not set, the buffered output rows may not be 
> flushed to database for a long time. That is a surprising behavior because no 
> results are outputed by default. 
> So I propose to have a default flush '1s' interval for JDBC sink or default 1 
> row for flush size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)