FYI the schemas option has been pursued a little bit in JdbcSchemaIOProvider [1], which naively generates SELECT and INSERT statements for reads and writes. Practically, this code is only usable from SQL, and multi-language pipelines (e.g. it's accessible from the python SDK [2]). We could consider either: - Moving this logic into JdbcIO and re-using it in JdbcSchemaIOProvider, or - Adding a user-friendly interface to SchemaIOProvider implementations in the Java SDK
Brian [1] https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java [2] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/jdbc.py On Tue, Mar 23, 2021 at 12:03 AM Thomas Fredriksen(External) < [email protected]> wrote: > That is a very good question. > > Personally, I would prefer that read and write were simplified. I guess > there will always be a need for writing complex queries, but the vast > majority of pipelines will only need to read or write data to or from a > table. As such, having read/write functions that will take an input-class > (BEAN or POJO for example) and simply generate the required write-statement > would be sufficient. Upserts should also be a part of this. > > For example: > > ``` > PCollection<MyBean> collection = ...; > collection.apply("Write to database", JdbcIO.writeTable(MyBean.class) > .withDataSourceConfiguration(mySourceConfiguration) > .withTableName(myTableName) > .withUpsertOption(UpsertOption.create() > .withConflictTarget(keyColumn) > .withDoUpdate()); > ``` > This would of course assume that the columns of `myTableName` would match > the members of `MyBean`. > > There are of course technical challenges with this: > * How to handle situations where the column names do not match the > input-type > * How to detect columns from the input-type. > > As an alternative, schemas may be an option: > > ``` > PCollection<Row> collection = ...; > collection.apply("Write to database", JdbcIO.writeRows() > .withSchema(mySchema) > .withDataSourceConfiguration(mySourceConfiguration) > .withTableName(myTableName) > .withUpsertOption(UpsertOption.create() > .withConflictTarget(keyColumn) > .withDoUpdate()); > ``` > This would allow for greater flexibility, but we lose the type-strong > nature of first suggestion. > > I hope this helps. > > Best Regards > Thomas Li Fredriksen > > On Fri, Mar 19, 2021 at 7:17 PM Alexey Romanenko <[email protected]> > wrote: > >> Hmm, interesting question. Since we don’t have any answers yet may I ask >> you a question - do you have an example of what like this could be these >> practises or how it can be simplified? >> >> >> PS: Not sure that it can help but JdbcIO allows to set a query with >> “ValueProvider” option which can be helpful to parametrise your transform >> with values that are only available during pipeline execution and can be >> used for pipeline templates [1]. >> >> [1] >> https://cloud.google.com/dataflow/docs/guides/templates/creating-templates >> >> > On 17 Mar 2021, at 14:06, Thomas Fredriksen(External) < >> [email protected]> wrote: >> > >> > Hello everyone, >> > >> > I was wondering what is considered best-practice when writing SQL >> statements for the JdbcIO connector? >> > >> > Hand-writing the statements and subsequent preparedStatementSetter >> causes a lot of bloat and is not very manageable. >> > >> > Thank you/ >> > >> > Best Regards >> > Thomas Li Fredriksen >> >>
