Re: JdbcIO SQL best practice

Brian Hulette Tue, 23 Mar 2021 07:16:35 -0700

FYI the schemas option has been pursued a little bit in
JdbcSchemaIOProvider [1], which naively generates SELECT and INSERT
statements for reads and writes. Practically, this code is only usable from
SQL, and multi-language pipelines (e.g. it's accessible from the python SDK
[2]). We could consider either:
- Moving this logic into JdbcIO and re-using it in JdbcSchemaIOProvider, or
- Adding a user-friendly interface to SchemaIOProvider implementations in
the Java SDK


Brian

[1]
https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java
[2]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/jdbc.py

On Tue, Mar 23, 2021 at 12:03 AM Thomas Fredriksen(External) <
[email protected]> wrote:

> That is a very good question.
>
> Personally, I would prefer that read and write were simplified. I guess
> there will always be a need for writing complex queries, but the vast
> majority of pipelines will only need to read or write data to or from a
> table. As such, having read/write functions that will take an input-class
> (BEAN or POJO for example) and simply generate the required write-statement
> would be sufficient. Upserts should also be a part of this.
>
> For example:
>
> ```
> PCollection<MyBean> collection = ...;
> collection.apply("Write to database", JdbcIO.writeTable(MyBean.class)
>         .withDataSourceConfiguration(mySourceConfiguration)
>         .withTableName(myTableName)
>         .withUpsertOption(UpsertOption.create()
>                 .withConflictTarget(keyColumn)
>                 .withDoUpdate());
> ```
> This would of course assume that the columns of `myTableName` would match
> the members of `MyBean`.
>
> There are of course technical challenges with this:
> * How to handle situations where the column names do not match the
> input-type
> * How to detect columns from the input-type.
>
> As an alternative, schemas may be an option:
>
> ```
> PCollection<Row> collection = ...;
> collection.apply("Write to database", JdbcIO.writeRows()
>         .withSchema(mySchema)
>         .withDataSourceConfiguration(mySourceConfiguration)
>         .withTableName(myTableName)
>         .withUpsertOption(UpsertOption.create()
>                 .withConflictTarget(keyColumn)
>                 .withDoUpdate());
> ```
> This would allow for greater flexibility, but we lose the type-strong
> nature of first suggestion.
>
> I hope this helps.
>
> Best Regards
> Thomas Li Fredriksen
>
> On Fri, Mar 19, 2021 at 7:17 PM Alexey Romanenko <[email protected]>
> wrote:
>
>> Hmm, interesting question. Since we don’t have any answers yet may I ask
>> you a question - do you have an example of what like this could be these
>> practises or how it can be simplified?
>>
>>
>> PS: Not sure that it can help but JdbcIO allows to set a query with
>> “ValueProvider” option which can be helpful to parametrise your transform
>> with values that are only available during pipeline execution and can be
>> used for pipeline templates [1].
>>
>> [1]
>> https://cloud.google.com/dataflow/docs/guides/templates/creating-templates
>>
>> > On 17 Mar 2021, at 14:06, Thomas Fredriksen(External) <
>> [email protected]> wrote:
>> >
>> > Hello everyone,
>> >
>> > I was wondering what is considered best-practice when writing SQL
>> statements for the JdbcIO connector?
>> >
>> > Hand-writing the statements and subsequent preparedStatementSetter
>> causes a lot of bloat and is not very manageable.
>> >
>> > Thank you/
>> >
>> > Best Regards
>> > Thomas Li Fredriksen
>>
>>

Re: JdbcIO SQL best practice

Reply via email to