Re: JdbcIO SQL best practice

Alexey Romanenko Wed, 24 Mar 2021 10:56:09 -0700

Thanks for details.

If I’m not mistaken, JdbcIO already supports both your suggestions for read and 
write (at lest, in some way) [1][2].


Some examples from tests:
- write from PCollection<Row> [3], 
- read to PCollection<Row> [4], 
- write from PCollection<POJO> with JavaBeanSchema [5] 

Is it something that you are looking for?

[1] https://issues.apache.org/jira/browse/BEAM-6674
[2] https://github.com/apache/beam/pull/8725
[3] 
https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469
[4] 
https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L524
[5] 
https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469


> On 23 Mar 2021, at 08:03, Thomas Fredriksen(External) 
> <[email protected]> wrote:
> 
> That is a very good question.
> 
> Personally, I would prefer that read and write were simplified. I guess there 
> will always be a need for writing complex queries, but the vast majority of 
> pipelines will only need to read or write data to or from a table. As such, 
> having read/write functions that will take an input-class (BEAN or POJO for 
> example) and simply generate the required write-statement would be 
> sufficient. Upserts should also be a part of this.
> 
> For example:
> 
> ```
> PCollection<MyBean> collection = ...;
> collection.apply("Write to database", JdbcIO.writeTable(MyBean.class)
>         .withDataSourceConfiguration(mySourceConfiguration)
>         .withTableName(myTableName)
>         .withUpsertOption(UpsertOption.create()
>                 .withConflictTarget(keyColumn)
>                 .withDoUpdate());
> ```
> This would of course assume that the columns of `myTableName` would match the 
> members of `MyBean`.
> 
> There are of course technical challenges with this:
> * How to handle situations where the column names do not match the input-type
> * How to detect columns from the input-type.
> 
> As an alternative, schemas may be an option:
> 
> ```
> PCollection<Row> collection = ...;
> collection.apply("Write to database", JdbcIO.writeRows()
>         .withSchema(mySchema)
>         .withDataSourceConfiguration(mySourceConfiguration)
>         .withTableName(myTableName)
>         .withUpsertOption(UpsertOption.create()
>                 .withConflictTarget(keyColumn)
>                 .withDoUpdate());
> ```
> This would allow for greater flexibility, but we lose the type-strong nature 
> of first suggestion.
> 
> I hope this helps.
> 
> Best Regards
> Thomas Li Fredriksen
> 
> On Fri, Mar 19, 2021 at 7:17 PM Alexey Romanenko <[email protected] 
> <mailto:[email protected]>> wrote:
> Hmm, interesting question. Since we don’t have any answers yet may I ask you 
> a question - do you have an example of what like this could be these 
> practises or how it can be simplified?
> 
> 
> PS: Not sure that it can help but JdbcIO allows to set a query with 
> “ValueProvider” option which can be helpful to parametrise your transform 
> with values that are only available during pipeline execution and can be used 
> for pipeline templates [1].
> 
> [1] 
> https://cloud.google.com/dataflow/docs/guides/templates/creating-templates 
> <https://cloud.google.com/dataflow/docs/guides/templates/creating-templates>
> 
> > On 17 Mar 2021, at 14:06, Thomas Fredriksen(External) 
> > <[email protected] <mailto:[email protected]>> 
> > wrote:
> > 
> > Hello everyone,
> > 
> > I was wondering what is considered best-practice when writing SQL 
> > statements for the JdbcIO connector?
> > 
> > Hand-writing the statements and subsequent preparedStatementSetter causes a 
> > lot of bloat and is not very manageable.
> > 
> > Thank you/
> > 
> > Best Regards
> > Thomas Li Fredriksen
>

Re: JdbcIO SQL best practice

Reply via email to