[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16495: Description: Currently, Elasticsearch sink provides 3 flush options: {code:java} 'connector.bulk-flush.max-actions' = '42' 'connector.bulk-flush.max-size' = '42 mb' 'connector.bulk-flush.interval' = '6' {code} All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. So we propose to have a default flush '1s' interval and '1000' rows and '2mb' size for ES sink flush. This only applies to new ES sink options: {code} 'sink.bulk-flush.max-actions' = '1000' 'sink.bulk-flush.max-size' = '2mb' 'sink.bulk-flush.interval' = '1s' {code} [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html was: Currently, Elasticsearch sink provides 3 flush options: {code:java} 'connector.bulk-flush.max-actions' = '42' 'connector.bulk-flush.max-size' = '42 mb' 'connector.bulk-flush.interval' = '6' {code} All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. So we propose to have a default flush '1s' interval and '1000' rows and '5mb' size for ES sink flush. This only applies to new ES sink options: {code} 'sink.bulk-flush.max-actions' = '1000' -- default value of ES clien 'sink.bulk-flush.max-size' = '5mb' -- default value of ES clien 'sink.bulk-flush.interval' = '1s' -- same to JDBC sink {code} [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Critical > Labels: usability > Fix For: 1.11.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > So we propose to have a default flush '1s' interval and '1000' rows and > '2mb' size for ES sink flush. This only applies to new ES sink options: > {code} > 'sink.bulk-flush.max-actions' = '1000' > 'sink.bulk-flush.max-size' = '2mb' > 'sink.bulk-flush.interval' = '1s' > {code} > [1]: >
[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16495: Description: Currently, Elasticsearch sink provides 3 flush options: {code:java} 'connector.bulk-flush.max-actions' = '42' 'connector.bulk-flush.max-size' = '42 mb' 'connector.bulk-flush.interval' = '6' {code} All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. So we propose to have a default flush '1s' interval and '1000' rows and '5mb' size for ES sink flush. This only applies to new ES sink options: {code} 'sink.bulk-flush.max-actions' = '1000' -- default value of ES clien 'sink.bulk-flush.max-size' = '5mb' -- default value of ES clien 'sink.bulk-flush.interval' = '1s' -- same to JDBC sink {code} [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html was: Currently, Elasticsearch sink provides 3 flush options: {code:java} 'connector.bulk-flush.max-actions' = '42' 'connector.bulk-flush.max-size' = '42 mb' 'connector.bulk-flush.interval' = '6' {code} All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. So we propose to have a default flush '1s' interval (same to JDBC sink) and '2mb' size (default value of HBase client) for HBase sink flush. This only applies to new JDBC sink options: {code} 'sink.buffer-flush.max-actions' = 'none' 'sink.buffer-flush.max-size' = '2mb' 'sink.buffer-flush.interval' = '1s' {code} [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Critical > Labels: usability > Fix For: 1.11.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > So we propose to have a default flush '1s' interval and '1000' rows and > '5mb' size for ES sink flush. This only applies to new ES sink options: > {code} > 'sink.bulk-flush.max-actions' = '1000' -- default value of ES clien > 'sink.bulk-flush.max-size' = '5mb' -- default value of ES clien > 'sink.bulk-flush.interval' = '1s' -- same to JDBC sink > {code} > [1]: >
[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16495: Description: Currently, Elasticsearch sink provides 3 flush options: {code:java} 'connector.bulk-flush.max-actions' = '42' 'connector.bulk-flush.max-size' = '42 mb' 'connector.bulk-flush.interval' = '6' {code} All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. So we propose to have a default flush '1s' interval (same to JDBC sink) and '2mb' size (default value of HBase client) for HBase sink flush. This only applies to new JDBC sink options: {code} 'sink.buffer-flush.max-actions' = 'none' 'sink.buffer-flush.max-size' = '2mb' 'sink.buffer-flush.interval' = '1s' {code} [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html was: Currently, Elasticsearch sink provides 3 flush options: {code:java} 'connector.bulk-flush.max-actions' = '42' 'connector.bulk-flush.max-size' = '42 mb' 'connector.bulk-flush.interval' = '6' {code} All of them are optional and have no default value in Flink side [1]. But flush actions and flush size have a default value {{1000}} and {{5mb}} in Elasticsearch client [2]. This results in some surprising behavior that no results are outputed by default, see user report [3]. Because it has to wait for 1000 records however there is no so many records in the testing. This will also be a potential "problem" in production. Because if it's a low throughout job, soem data may take a very long time to be visible in the elasticsearch. In this issue, I propose to have Flink's default values for these 3 options. {code:java} 'connector.bulk-flush.max-actions' = '1000' -- same to the ES client default value 'connector.bulk-flush.max-size' = '5mb' -- same to the ES client default value 'connector.bulk-flush.interval' = '5s' -- avoid no output result {code} [1]: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 [2]: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html [3]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Critical > Labels: usability > Fix For: 1.11.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > So we propose to have a default flush '1s' interval (same to JDBC sink) and > '2mb' size (default value of HBase client) for HBase sink flush. This only > applies to new JDBC sink options: > {code} > 'sink.buffer-flush.max-actions' = 'none' > 'sink.buffer-flush.max-size' = '2mb' > 'sink.buffer-flush.interval' = '1s' > {code} > [1]: >
[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16495: Fix Version/s: (was: 1.12.0) 1.11.0 > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Labels: usability > Fix For: 1.11.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > In this issue, I propose to have Flink's default values for these 3 options. > {code:java} > 'connector.bulk-flush.max-actions' = '1000' -- same to the ES client > default value > 'connector.bulk-flush.max-size' = '5mb' -- same to the ES client default > value > 'connector.bulk-flush.interval' = '5s' -- avoid no output result > {code} > [1]: > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 > [2]: > https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html > [3]: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16495: Priority: Critical (was: Major) > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Critical > Labels: usability > Fix For: 1.11.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > In this issue, I propose to have Flink's default values for these 3 options. > {code:java} > 'connector.bulk-flush.max-actions' = '1000' -- same to the ES client > default value > 'connector.bulk-flush.max-size' = '5mb' -- same to the ES client default > value > 'connector.bulk-flush.interval' = '5s' -- avoid no output result > {code} > [1]: > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 > [2]: > https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html > [3]: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated FLINK-16495: --- Fix Version/s: (was: 1.11.0) 1.12.0 > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Labels: usability > Fix For: 1.12.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > In this issue, I propose to have Flink's default values for these 3 options. > {code:java} > 'connector.bulk-flush.max-actions' = '1000' -- same to the ES client > default value > 'connector.bulk-flush.max-size' = '5mb' -- same to the ES client default > value > 'connector.bulk-flush.interval' = '5s' -- avoid no output result > {code} > [1]: > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 > [2]: > https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html > [3]: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16495) Improve default flush strategy for Elasticsearch sink to make it work out-of-box
[ https://issues.apache.org/jira/browse/FLINK-16495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu updated FLINK-16495: Labels: usability (was: ) > Improve default flush strategy for Elasticsearch sink to make it work > out-of-box > > > Key: FLINK-16495 > URL: https://issues.apache.org/jira/browse/FLINK-16495 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Labels: usability > Fix For: 1.11.0 > > > Currently, Elasticsearch sink provides 3 flush options: > {code:java} > 'connector.bulk-flush.max-actions' = '42' > 'connector.bulk-flush.max-size' = '42 mb' > 'connector.bulk-flush.interval' = '6' > {code} > All of them are optional and have no default value in Flink side [1]. But > flush actions and flush size have a default value {{1000}} and {{5mb}} in > Elasticsearch client [2]. This results in some surprising behavior that no > results are outputed by default, see user report [3]. Because it has to wait > for 1000 records however there is no so many records in the testing. > This will also be a potential "problem" in production. Because if it's a low > throughout job, soem data may take a very long time to be visible in the > elasticsearch. > In this issue, I propose to have Flink's default values for these 3 options. > {code:java} > 'connector.bulk-flush.max-actions' = '1000' -- same to the ES client > default value > 'connector.bulk-flush.max-size' = '5mb' -- same to the ES client default > value > 'connector.bulk-flush.interval' = '5s' -- avoid no output result > {code} > [1]: > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L357-L356 > [2]: > https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html > [3]: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Should-I-use-a-Sink-or-Connector-Or-Both-td33352.html -- This message was sent by Atlassian Jira (v8.3.4#803005)