Jark,

Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be set 
to false for streaming to work. The default on the driver is true when the 
option isn’t specified. I’ve updated the issue accordingly.

Setting this to false automatically on the read path would fix my issue. 
However, I’m only certain that this is proper for Postgres. I’m not sure if 
this should be done for other drivers, although my gut would say it should be 
fine if it’s only done for reading. My patch as it is will set the builder to 
not specify whether to set autoCommit if the option is not specified, which 
means it would then be left at the default of true. That would conflict with 
the 1.11 patch you suggested. Let me know if you think I should make the 
default true in the SQL API.

https://github.com/apache/flink/pull/13570

Regards,
Dylan

From: Jark Wu <imj...@gmail.com>
Date: Thursday, October 8, 2020 at 10:15 PM
To: Dylan Forciea <dy...@oseberg.io>
Cc: Till Rohrmann <trohrm...@apache.org>, dev <d...@flink.apache.org>, Shengkai 
Fang <fskm...@gmail.com>, "user@flink.apache.org" <user@flink.apache.org>, 
Leonard Xu <xbjt...@gmail.com>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

Sorry for the late reply. We've just come back from a long holiday.

Thanks for reporting this problem. First, I think this is a bug that 
`autoCommit` is false by default (JdbcRowDataInputFormat.Builder).
We can fix the default to true in 1.11 series, and I think this can solve your 
problem in a short term?
Besides, we should expose the connector options to set auto commit and this can 
be another issue to be implemented in master.
I'm glad to review the code.

What do you think?

Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, 
see FLINK-19539 [1]

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-19539

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea 
<dy...@oseberg.io<mailto:dy...@oseberg.io>> wrote:
I’ve updated the unit tests and documentation, and I was running the azure test 
pipeline as described in the instructions. However, it appears that what seems 
to be an unrelated test for the JMX code failed. Is this a matter of me not 
setting things up correctly? I wanted to ensure everything looked good before I 
submitted the PR.

[ERROR] Failures:
[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46
Expected: (a value equal to or greater than <9000> and a value less than or 
equal to <9010>)
     but: a value less than or equal to <9010> <9040> was greater than <9010>
[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60
[INFO]
[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

Thanks,
Dylan Forciea

From: Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <dy...@oseberg.io<mailto:dy...@oseberg.io>>
Cc: dev <d...@flink.apache.org<mailto:d...@flink.apache.org>>, Shengkai Fang 
<fskm...@gmail.com<mailto:fskm...@gmail.com>>, 
"user@flink.apache.org<mailto:user@flink.apache.org>" 
<user@flink.apache.org<mailto:user@flink.apache.org>>, 
"j...@apache.org<mailto:j...@apache.org>" 
<j...@apache.org<mailto:j...@apache.org>>, Leonard Xu 
<xbjt...@gmail.com<mailto:xbjt...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

This sounds good. Maybe there are others in the community who can help with the 
review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea 
<dy...@oseberg.io<mailto:dy...@oseberg.io>> wrote:
Actually…. It looks like what I did covers both cases. I’ll see about getting 
some unit tests and documentation updated.

Dylan

From: Dylan Forciea <dy...@oseberg.io<mailto:dy...@oseberg.io>>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>>, dev 
<d...@flink.apache.org<mailto:d...@flink.apache.org>>
Cc: Shengkai Fang <fskm...@gmail.com<mailto:fskm...@gmail.com>>, 
"user@flink.apache.org<mailto:user@flink.apache.org>" 
<user@flink.apache.org<mailto:user@flink.apache.org>>, 
"j...@apache.org<mailto:j...@apache.org>" 
<j...@apache.org<mailto:j...@apache.org>>, Leonard Xu 
<xbjt...@gmail.com<mailto:xbjt...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Ok, I have created FLINK-19522 describing the issue. I have the code I made so 
far checked in at 
https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this 
only fixes the SQL API. It sounds like there may be another change needed for 
the Table API… I’ll look into that and see if I can figure it out on my own 
while they’re out. I will also need to add some unit tests and update some 
documentation to get this ready for a PR.

Thanks,
Dylan

From: Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <d...@flink.apache.org<mailto:d...@flink.apache.org>>
Cc: Shengkai Fang <fskm...@gmail.com<mailto:fskm...@gmail.com>>, 
"user@flink.apache.org<mailto:user@flink.apache.org>" 
<user@flink.apache.org<mailto:user@flink.apache.org>>, 
"j...@apache.org<mailto:j...@apache.org>" 
<j...@apache.org<mailto:j...@apache.org>>, Leonard Xu 
<xbjt...@gmail.com<mailto:xbjt...@gmail.com>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I 
am not an expert for the Table API and its JDBC connector but what you describe 
sounds like a missing feature. Also given that FLINK-12198 enabled this feature 
for the JDBCInputFormat indicates that we might simply need to make it 
configurable from the JdbcTableSource. I am pulling in Jark and Leonard who 
worked on the JdbcTableSource and might help you to get this feature into 
Flink. Their response could take a week because they are currently on vacation 
if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and 
describing the problem and your solution proposal.

[1] https://issues.apache.org/jira/browse/FLINK-12198

Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea 
<dy...@oseberg.io<mailto:dy...@oseberg.io>> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue 
after it is created. It wasn’t a huge amount of code, so it’s probably not a 
big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <fskm...@gmail.com<mailto:fskm...@gmail.com>>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <dy...@oseberg.io<mailto:dy...@oseberg.io>>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea 
<dy...@oseberg.io<mailto:dy...@oseberg.io><mailto:dy...@oseberg.io<mailto:dy...@oseberg.io>>>于2020年10月7日
 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email 
list.



If this is indeed an issue and not my misunderstanding, I have most of a patch 
already coded up. Please let me know, and I can create a JIRA issue and send 
out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea 
<dy...@oseberg.io<mailto:dy...@oseberg.io><mailto:dy...@oseberg.io<mailto:dy...@oseberg.io>>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: 
"user@flink.apache.org<mailto:user@flink.apache.org><mailto:user@flink.apache.org<mailto:user@flink.apache.org>>"
 
<user@flink.apache.org<mailto:user@flink.apache.org><mailto:user@flink.apache.org<mailto:user@flink.apache.org>>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran 
across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
 . I had attempted to load a large table in, but it completely slurped it into 
memory before starting the streaming. I modified the flink source code to add a 
scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory 
usage way down.



I see in this thread that there was a similar issue resolved for 
JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
 , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg



Reply via email to