Re: JdbcIO SQL best practice

Thomas Fredriksen(External) Tue, 23 Mar 2021 00:03:52 -0700

That is a very good question.

Personally, I would prefer that read and write were simplified. I guess
there will always be a need for writing complex queries, but the vast
majority of pipelines will only need to read or write data to or from a
table. As such, having read/write functions that will take an input-class
(BEAN or POJO for example) and simply generate the required write-statement
would be sufficient. Upserts should also be a part of this.


For example:

```
PCollection<MyBean> collection = ...;
collection.apply("Write to database", JdbcIO.writeTable(MyBean.class)
        .withDataSourceConfiguration(mySourceConfiguration)
        .withTableName(myTableName)
        .withUpsertOption(UpsertOption.create()
                .withConflictTarget(keyColumn)
                .withDoUpdate());
```
This would of course assume that the columns of `myTableName` would match
the members of `MyBean`.

There are of course technical challenges with this:
* How to handle situations where the column names do not match the
input-type
* How to detect columns from the input-type.

As an alternative, schemas may be an option:

```
PCollection<Row> collection = ...;
collection.apply("Write to database", JdbcIO.writeRows()
        .withSchema(mySchema)
        .withDataSourceConfiguration(mySourceConfiguration)
        .withTableName(myTableName)
        .withUpsertOption(UpsertOption.create()
                .withConflictTarget(keyColumn)
                .withDoUpdate());
```
This would allow for greater flexibility, but we lose the type-strong
nature of first suggestion.

I hope this helps.

Best Regards
Thomas Li Fredriksen

On Fri, Mar 19, 2021 at 7:17 PM Alexey Romanenko <[email protected]>
wrote:

> Hmm, interesting question. Since we don’t have any answers yet may I ask
> you a question - do you have an example of what like this could be these
> practises or how it can be simplified?
>
>
> PS: Not sure that it can help but JdbcIO allows to set a query with
> “ValueProvider” option which can be helpful to parametrise your transform
> with values that are only available during pipeline execution and can be
> used for pipeline templates [1].
>
> [1]
> https://cloud.google.com/dataflow/docs/guides/templates/creating-templates
>
> > On 17 Mar 2021, at 14:06, Thomas Fredriksen(External) <
> [email protected]> wrote:
> >
> > Hello everyone,
> >
> > I was wondering what is considered best-practice when writing SQL
> statements for the JdbcIO connector?
> >
> > Hand-writing the statements and subsequent preparedStatementSetter
> causes a lot of bloat and is not very manageable.
> >
> > Thank you/
> >
> > Best Regards
> > Thomas Li Fredriksen
>
>

Re: JdbcIO SQL best practice

Reply via email to