Re: Query about autinference of numPartitions for `JdbcIO#readWithPartitions`

2024-05-31 Thread XQ Hu via user
). More details are here: https://issues.apache.org/jira/browse/BEAM-12456 On Fri, May 31, 2024 at 7:40 AM Vardhan Thigle via user < user@beam.apache.org> wrote: > Hi Beam Experts,I have a small query about `JdbcIO#readWithPartitions` > > > ContextJdbcIO#readWithPartitions seems

Query about autinference of numPartitions for `JdbcIO#readWithPartitions`

2024-05-31 Thread Vardhan Thigle via user
Hi Beam Experts,I have a small query about `JdbcIO#readWithPartitions` ContextJdbcIO#readWithPartitions seems to always default to 200 partitions (DEFAULT_NUM_PARTITIONS). This is set by default when the object is constructed here <https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/

Re: Query about `JdbcIO`

2024-02-25 Thread Vardhan Thigle via user
a/org/apache/beam/sdk/io/jdbc/JdbcUtilTest.java#L59 > > However, you could use the code from the test to create yours. > > On Thu, Feb 22, 2024 at 11:20 AM Vardhan Thigle via user < > user@beam.apache.org> wrote: > >> Hi, >> I had a small query about

Re: Query about `JdbcIO`

2024-02-24 Thread XQ Hu via user
Vardhan Thigle via user < user@beam.apache.org> wrote: > Hi, > I had a small query about `JdbcIO`. > As per the documentation > <https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.html> > `readWithPartitions` is supported f

Query about `JdbcIO`

2024-02-22 Thread Vardhan Thigle via user
Hi, I had a small query about `JdbcIO`. As per the documentation <https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.html> `readWithPartitions` is supported for Long, DateTime <https://static.javadoc.io/joda-time/joda-time/2.10.10/org/joda/time/Date

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-25 Thread Juan Cuzmar
; > PubsubIO.readMessagesWithAttributes().fromSubscription(String.format("projects/%s/subscriptions/%s", > > > > > > options.getProjectId(), subscription))) > > > > > > .apply("Transform", ParDo.of(new MyTransformer())); &g

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Juan Cuzmar
bscription(String.format("projects/%s/subscriptions/%s", >>>>> > options.getProjectId(), subscription))) >>>>> > .apply("Transform", ParDo.of(new MyTransformer())); >>>>> > >>>>> > PCollection insert = resul

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Reuven Lax via user
;> > > JdbcIO.write() >> > > .withDataSourceProviderFn(DataSourceProvider.of(authDatabaseConfig)) >> > > .withStatement("INSERT INTO person (first_name, last_name) VALUES (?, >> 'doe')") >> > > .withPreparedStatementSetter((el

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Juan Cuzmar
irst_name, last_name) VALUES (?, >>> > 'doe')") >>> > .withPreparedStatementSetter((element, preparedStatement) -> { >>> > log.info("Preparing statement to insert"); >>> > preparedStatement.setString(1, element); >>> > }

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Reuven Lax via user
> .withPreparedStatementSetter((element, preparedStatement) -> { > > > log.info("Preparing statement to insert"); > > > preparedStatement.setString(1, element); > > > }) > > > .withResults() > > > ); > > > result.apply(Wait.on(insert))

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Juan Cuzmar
new SomeTransform()) > > .apply("PubsubMessaging", ParDo.of(new NextTransformer())); > > https://github.com/j1cs/app-beam/blob/main/src/main/java/me/jics/AppBeamCommand.java#L63 > > > > --- Original Message --- > > On Saturday, April 22nd, 2023 at 2:08

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Reuven Lax via user
b.com/j1cs/app-beam/blob/main/src/main/java/me/jics/AppBeamCommand.java#L63 > > --- Original Message --- > On Saturday, April 22nd, 2023 at 2:08 AM, Reuven Lax via user < > user@beam.apache.org> wrote: > > > > I believe you have to call withResults() on the Jdbc

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Juan Cuzmar
Messaging", ParDo.of(new NextTransformer())); https://github.com/j1cs/app-beam/blob/main/src/main/java/me/jics/AppBeamCommand.java#L63 --- Original Message --- On Saturday, April 22nd, 2023 at 2:08 AM, Reuven Lax via user wrote: > I believe you have to call withResults() on the Jdb

Re: Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-22 Thread Reuven Lax via user
I believe you have to call withResults() on the JdbcIO transform in order for this to work. On Fri, Apr 21, 2023 at 10:35 PM Juan Cuzmar wrote: > I hope you all are doing well. I am facing an issue with an Apache Beam > pipeline that gets stuck indefinitely when using the Wait.on tra

Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

2023-04-21 Thread Juan Cuzmar
I hope you all are doing well. I am facing an issue with an Apache Beam pipeline that gets stuck indefinitely when using the Wait.on transform alongside JdbcIO. Here's a simplified version of my code, focusing on the relevant parts: PCollection result = p. apply("P

Re: Recommended way to set coder for JdbcIO with Apache Beam

2023-03-27 Thread Juan Cuzmar
lexey > > > On 23 Mar 2023, at 19:08, Juan Cuzmar jcuz...@protonmail.com wrote: > > > > unfortunately didn't work. > > I added the @DefaultCoder as you told me and i removed the .withCoder from > > my JdbcIO pipe and showed the same error: >

Re: Recommended way to set coder for JdbcIO with Apache Beam

2023-03-27 Thread Alexey Romanenko
> On 23 Mar 2023, at 19:08, Juan Cuzmar wrote: > > unfortunately didn't work. > I added the @DefaultCoder as you told me and i removed the .withCoder from my > JdbcIO pipe and showed the same error: > > java.lang.IllegalStateException: Unable to infer a coder for JdbcIO.r

Re: Recommended way to set coder for JdbcIO with Apache Beam

2023-03-23 Thread Juan Cuzmar
unfortunately didn't work. I added the @DefaultCoder as you told me and i removed the .withCoder from my JdbcIO pipe and showed the same error: java.lang.IllegalStateException: Unable to infer a coder for JdbcIO.readAll() transform. Provide a coder via withCoder, or ensure that one can

Re: Recommended way to set coder for JdbcIO with Apache Beam

2023-03-23 Thread Alexey Romanenko
you share a class declaration of your InboundData class? Is it just a >> POJO? >> >> — >> Alexey >> >>> On 23 Mar 2023, at 08:16, Juan Cuzmar jcuz...@protonmail.com wrote: >>> >>> Hello all, >>> >>> I hope this message find

Re: Recommended way to set coder for JdbcIO with Apache Beam

2023-03-23 Thread Juan Cuzmar
; POJO? > > — > Alexey > > > On 23 Mar 2023, at 08:16, Juan Cuzmar jcuz...@protonmail.com wrote: > > > > Hello all, > > > > I hope this message finds you well. I am currently working with Apache > > Beam's JdbcIO and need some guidance regarding set

Re: Recommended way to set coder for JdbcIO with Apache Beam

2023-03-23 Thread Alexey Romanenko
Could you share a class declaration of your InboundData class? Is it just a POJO? — Alexey > On 23 Mar 2023, at 08:16, Juan Cuzmar wrote: > > Hello all, > > I hope this message finds you well. I am currently working with Apache Beam's > JdbcIO and need some guidance

Recommended way to set coder for JdbcIO with Apache Beam

2023-03-23 Thread Juan Cuzmar
Hello all, I hope this message finds you well. I am currently working with Apache Beam's JdbcIO and need some guidance regarding setting a coder for the input data without resorting to the deprecated withCoder method. I've been trying to resolve this issue and would appreciate any insights

Detecting stragglers while using jdbcio

2023-01-03 Thread Varun Rauthan
Hi, We have been using JDBCIO for data ingestion from oracle; it works just fine most of the days, but some days(usually the days when the DB is busy because of high user activity) couple of our jobs which usually take ~15-20 mins get in stuck state within dataflow with stages showing indication

Re: [Question] Handling failed records when using JdbcIO

2022-09-19 Thread Alexey Romanenko
Hi, I don’t think it’s possible “out-of-the-box” now but it could be a useful add-on for JdbcIO connector (dead-letter pcollection) since, iirc, it was already asked several times by users. For the moment, it’s only possible to play with RetryStrategy/RetryConfiguration in case of failures

[Question] Handling failed records when using JdbcIO

2022-09-18 Thread Yomal de Silva
Hi all, I am trying to figure out the right approach to handling failed records when persisting to a database through the JdbcIO sink. We have created a DoFn to do this task through a PreparedStatement and catch any exceptions then send it through side output for further processing if required

Re: JdbcIO

2022-04-22 Thread Alexey Romanenko
ric Berryman wrote: > > Does an unbounded JdbcIO exist, or would I need to wrap the existing one in a > spilttable DoFn? Or maybe there is an easier way to do it? > > Thank you again, > Eric > > > > On Wed, Apr 20, 2022, 21:59 Ahmet Altay <mailto:al...@goog

Re: JdbcIO

2022-04-22 Thread Austin Bennett
year's beam summit: https://www.youtube.com/watch?v=hu5FacAeQ-8 https://www.youtube.com/watch?v=U_RshngpxLc On Fri, Apr 22, 2022 at 4:41 AM Eric Berryman wrote: > Does an unbounded JdbcIO exist, or would I need to wrap the existing one > in a spilttable DoFn? Or maybe there is an easier way

Re: JdbcIO

2022-04-22 Thread Eric Berryman
Does an unbounded JdbcIO exist, or would I need to wrap the existing one in a spilttable DoFn? Or maybe there is an easier way to do it? Thank you again, Eric On Wed, Apr 20, 2022, 21:59 Ahmet Altay wrote: > /cc @Pablo Estrada @John Casey > > > On Wed, Apr 20, 2022 at 6:29 PM E

JdbcIO

2022-04-20 Thread Eric Berryman
, and or code, which displays reading from the JdbcIO API with checkpoints. I would like to avoid the initial load on restarts, upgrades, etc. :) Thank you for your time! Eric

Re: JdbcIO parallel read on spark

2021-05-27 Thread Thomas Fredriksen(External)
Quick update. After some testing, we have noticed that the splittable JdbcIO-poc works well when the number of splits does not exceed the number of spark tasks. In cases where the number of splits do exceed the task count, the pipeline freezes after each worker has processed a single split each

Re: JdbcIO parallel read on spark

2021-05-25 Thread Thomas Fredriksen(External)
plittable DoFn's, we decided to give it a try. We essentially copied the source of JdbcIO into our project and changed `ReadFn` into `SplittableJdbcIO` which acts as (more or less) a drop-in replacement (see source below). The DAG here is seemingly simpler with a single stage containing all steps

Re: JdbcIO parallel read on spark

2021-05-25 Thread Alexey Romanenko
Hi, Did you check a Spark DAG if it doesn’t fork branches after "Genereate queries” transform? — Alexey > On 24 May 2021, at 20:32, Thomas Fredriksen(External) > wrote: > > Hi there, > > We are struggling to get the JdbcIO-connector to read a large table on spark.

JdbcIO parallel read on spark

2021-05-24 Thread Thomas Fredriksen(External)
Hi there, We are struggling to get the JdbcIO-connector to read a large table on spark. In short - we wish to read a large table (several billion rows), transform then write the transformed data to a new table. We are aware that `JdbcIO.read()` does not parallelize. In order to solve this, we

Re: JdbcIO SQL best practice

2021-04-15 Thread Alexey Romanenko
wrote: > > This seems very promising, > > Will the write from PCollectino handle upserts? > > On Wed, Mar 24, 2021 at 6:56 PM Alexey Romanenko <mailto:aromanenko@gmail.com>> wrote: > Thanks for details. > > If I’m not mistaken, JdbcIO already supports b

Re: JdbcIO SQL best practice

2021-04-14 Thread Thomas Fredriksen(External)
This seems very promising, Will the write from PCollectino handle upserts? On Wed, Mar 24, 2021 at 6:56 PM Alexey Romanenko wrote: > Thanks for details. > > If I’m not mistaken, JdbcIO already supports both your suggestions for > read and write (at lest, in some way) [1][2]. >

Re: JdbcIO SQL best practice

2021-03-24 Thread Alexey Romanenko
Thanks for details. If I’m not mistaken, JdbcIO already supports both your suggestions for read and write (at lest, in some way) [1][2]. Some examples from tests: - write from PCollection [3], - read to PCollection [4], - write from PCollection with JavaBeanSchema [5] Is it something

Re: JdbcIO SQL best practice

2021-03-23 Thread Brian Hulette
consider either: - Moving this logic into JdbcIO and re-using it in JdbcSchemaIOProvider, or - Adding a user-friendly interface to SchemaIOProvider implementations in the Java SDK Brian [1] https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc

Re: JdbcIO SQL best practice

2021-03-23 Thread Thomas Fredriksen(External)
u a question - do you have an example of what like this could be these > practises or how it can be simplified? > > > PS: Not sure that it can help but JdbcIO allows to set a query with > “ValueProvider” option which can be helpful to parametrise your transform > with values that

Re: JdbcIO SQL best practice

2021-03-19 Thread Alexey Romanenko
Hmm, interesting question. Since we don’t have any answers yet may I ask you a question - do you have an example of what like this could be these practises or how it can be simplified? PS: Not sure that it can help but JdbcIO allows to set a query with “ValueProvider” option which can

JdbcIO SQL best practice

2021-03-17 Thread Thomas Fredriksen(External)
Hello everyone, I was wondering what is considered best-practice when writing SQL statements for the JdbcIO connector? Hand-writing the statements and subsequent preparedStatementSetter causes a lot of bloat and is not very manageable. Thank you/ Best Regards Thomas Li Fredriksen

Re: JDBCIO reader + BigQuery writer extremly slow due to bundle size = 1

2020-01-07 Thread Konstantinos P.
> Searching on the Internet I found a very similar question on SO but >>> without any answer: >>> https://stackoverflow.com/questions/49276833/slow-bigquery-load-job-when-data-comes-from-apache-beam-jdbcio >>> The other user has made an experiment by replacing JdbcIO with

Re: JDBCIO reader + BigQuery writer extremly slow due to bundle size = 1

2020-01-02 Thread Eugene Kirpichov
? > > >> >> Searching on the Internet I found a very similar question on SO but >> without any answer: >> https://stackoverflow.com/questions/49276833/slow-bigquery-load-job-when-data-comes-from-apache-beam-jdbcio >> The other user has made an experiment by re

Re: JDBCIO reader + BigQuery writer extremly slow due to bundle size = 1

2020-01-02 Thread Chamikara Jayalath
overflow.com/questions/49276833/slow-bigquery-load-job-when-data-comes-from-apache-beam-jdbcio > The other user has made an experiment by replacing JdbcIO with TextIO and > (s)he couldn't replicate the behavior so it seems that the problem here is > the combination of JdbcIO + BigQueryIO

JDBCIO reader + BigQuery writer extremly slow due to bundle size = 1

2020-01-02 Thread Konstantinos P.
the pipeline unscalable. Searching on the Internet I found a very similar question on SO but without any answer: https://stackoverflow.com/questions/49276833/slow-bigquery-load-job-when-data-comes-from-apache-beam-jdbcio The other user has made an experiment by replacing JdbcIO with TextIO and (s)he

Re: JDBCIO Connection Pooling

2019-04-01 Thread Jean-Baptiste Onofré
with my > aggressive pool connections. After some debugging I noticed that the > datasource was still being wrapped in a pooling datasource..even through it > already is a pooled datasource. I was wondering what strangeness this caused, > so locally I hacked JdbcIO to ju

JDBCIO Connection Pooling

2019-03-28 Thread hgu2hw+2g0aed6fdoszs
connections. After some debugging I noticed that the datasource was still being wrapped in a pooling datasource..even through it already is a pooled datasource. I was wondering what strangeness this caused, so locally I hacked JdbcIO to just return my c3p0 datasource and do nothing else

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Rui Wang
I see. I wanted to get some feedback from community to see if calling proc before running sql makes sense in JdbcIO. If that doesn't make sense then I can close this JIRA. -Rui On Tue, Feb 5, 2019 at 3:00 PM Juan Carlos Garcia wrote: > I believe this is not a missing feature, as the quest

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Juan Carlos Garcia
with something like this: {call procedure_name(?, ?, ?)} But then question is what do you expect from it? BTW JdbcIO is just a very simple ParDo which you can create your own when dealing with anything special from oracle. Best regards JC Am Di., 5. Feb. 2019, 23:03 hat Rui Wang geschrieben

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Rui Wang
Assuming this is a missing feature. Created https://jira.apache.org/jira/browse/BEAM-6525 to track it. -Rui On Fri, Jan 25, 2019 at 10:35 AM Rui Wang wrote: > Hi Community, > > There is a stackoverflow question [1] asking how to call Oracle stored > proc in Beam via JdbcIO. I kn

How to call Oracle stored proc in JdbcIO

2019-01-25 Thread Rui Wang
Hi Community, There is a stackoverflow question [1] asking how to call Oracle stored proc in Beam via JdbcIO. I know very less on JdbcIO and Oracle, so just help ask here to say if anyone know: does JdbcIO support call stored proc? If there is no such support, I will create a JIRA

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Eugene Kirpichov
t. So I will not generate uuid for same message twice. > In case of jdbcio it is possible to get uuid for same mesage twice( in case > of multiple io it might be a problem). > Deduplication of writes based on record id is a BigQuery feature, I don't think there's a way to accomplish it in

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Jean-Baptiste Onofré
Hi Aleksandr, I don't get your point about the connection: on your worker, who gonna create the threads ? Basically, we will have a connection per thread right now. For the checkpoint, JdbcIO is basically a DoFn, it doesn't deal with a CheckpointMark (if it's what you are talking about

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
with BQ service the exception will be thrown, but my job will be restored from latest checkpoint. So I will not generate uuid for same message twice. In case of jdbcio it is possible to get uuid for same mesage twice( in case of multiple io it might be a problem). Aleksandr. 14. märts 2018 10:37

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Eugene Kirpichov
.com> wrote: > So lets say I have 10 prepared statements, and hundreds threads, for > example 300. Dataflow will create 3000 connections to sql and in case of > autoscaling another node will create again 3000 connections? > > Another problem here, that jdbcio dont use any checkp

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
So lets say I have 10 prepared statements, and hundreds threads, for example 300. Dataflow will create 3000 connections to sql and in case of autoscaling another node will create again 3000 connections? Another problem here, that jdbcio dont use any checkpoints (and bq for example is doing

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Eugene Kirpichov
Aleksandr <aleksandr...@gmail.com> wrote: > Hello, > How many times will the setup per node be called? Is it possible to limit > pardo intances in google dataflow? > > Aleksandr. > > > > 14. märts 2018 9:22 PM kirjutas kuupäeval "Eugene Kirpichov" < > kirpi

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
Hello, How many times will the setup per node be called? Is it possible to limit pardo intances in google dataflow? Aleksandr. 14. märts 2018 9:22 PM kirjutas kuupäeval "Eugene Kirpichov" < kirpic...@google.com>: "Jdbcio will create for each prepared sta

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Eugene Kirpichov
rmannibucau.metawerx.net/> | Old Blog > <http://rmannibucau.wordpress.com> | Github > <https://github.com/rmannibucau> | LinkedIn > <https://www.linkedin.com/in/rmannibucau> | Book > <https://www.packtpub.com/application-development/java-ee-8-high-performance>

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Eugene Kirpichov
"Jdbcio will create for each prepared statement new connection" - this is not the case: the connection is created in @Setup and deleted in @Teardown. https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503 https://github.

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
Hello, we had similar problem. Current jdbcio will cause alot of connection errors. Typically you have more than one prepared statement. Jdbcio will create for each prepared statement new connection(and close only in teardown) So it is possible that connection will get timeot or in case in case

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Jean-Baptiste Onofré
Agree especially using the current JdbcIO impl that creates connection in the @Setup. Or it means that @Teardown is never called ? Regards JB Le 14 mars 2018 à 11:40, à 11:40, Eugene Kirpichov <kirpic...@google.com> a écrit: >Hi Derek - could you explain where does the "30

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Eugene Kirpichov
Hi Derek - could you explain where does the "3000 connections" number come from, i.e. how did you measure it? It's weird that 5-6 workers would use 3000 connections. On Wed, Mar 14, 2018 at 3:50 AM Derek Chan wrote: > Hi, > > We are new to Beam and need some help. > > We are

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Jean-Baptiste Onofré
Hi Derek, I think you could be interested by: https://github.com/apache/beam/pull/4461 related to BEAM-3500. I introduced an internal poolable datasource. I hope it could help. Regards JB On 14/03/2018 11:49, Derek Chan wrote: Hi, We are new to Beam and need some help. We are working on

Re: Reducing database connection with JdbcIO

2018-03-14 Thread Aleksandr
Hello, We did own jdbcio with thread pool per jwm (using lazy initialization in @Setup). In processElement we are getting/freeing connection. Best Regards, Aleksandr Gortujev. 14. märts 2018 12:49 PM kirjutas kuupäeval "Derek Chan" <derek...@gmail.com >: Hi, We are new to Be

Reducing database connection with JdbcIO

2018-03-14 Thread Derek Chan
Hi, We are new to Beam and need some help. We are working on a flow to ingest events and writes the aggregated counts to a database. The input rate is rather low (~2000 message per sec), but the processing is relatively heavy, that we need to scale out to 5~6 nodes. The output (via JDBC) is