Thanks for details. If I’m not mistaken, JdbcIO already supports both your suggestions for read and write (at lest, in some way) [1][2].
Some examples from tests: - write from PCollection<Row> [3], - read to PCollection<Row> [4], - write from PCollection<POJO> with JavaBeanSchema [5] Is it something that you are looking for? [1] https://issues.apache.org/jira/browse/BEAM-6674 [2] https://github.com/apache/beam/pull/8725 [3] https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469 [4] https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L524 [5] https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469 > On 23 Mar 2021, at 08:03, Thomas Fredriksen(External) > <[email protected]> wrote: > > That is a very good question. > > Personally, I would prefer that read and write were simplified. I guess there > will always be a need for writing complex queries, but the vast majority of > pipelines will only need to read or write data to or from a table. As such, > having read/write functions that will take an input-class (BEAN or POJO for > example) and simply generate the required write-statement would be > sufficient. Upserts should also be a part of this. > > For example: > > ``` > PCollection<MyBean> collection = ...; > collection.apply("Write to database", JdbcIO.writeTable(MyBean.class) > .withDataSourceConfiguration(mySourceConfiguration) > .withTableName(myTableName) > .withUpsertOption(UpsertOption.create() > .withConflictTarget(keyColumn) > .withDoUpdate()); > ``` > This would of course assume that the columns of `myTableName` would match the > members of `MyBean`. > > There are of course technical challenges with this: > * How to handle situations where the column names do not match the input-type > * How to detect columns from the input-type. > > As an alternative, schemas may be an option: > > ``` > PCollection<Row> collection = ...; > collection.apply("Write to database", JdbcIO.writeRows() > .withSchema(mySchema) > .withDataSourceConfiguration(mySourceConfiguration) > .withTableName(myTableName) > .withUpsertOption(UpsertOption.create() > .withConflictTarget(keyColumn) > .withDoUpdate()); > ``` > This would allow for greater flexibility, but we lose the type-strong nature > of first suggestion. > > I hope this helps. > > Best Regards > Thomas Li Fredriksen > > On Fri, Mar 19, 2021 at 7:17 PM Alexey Romanenko <[email protected] > <mailto:[email protected]>> wrote: > Hmm, interesting question. Since we don’t have any answers yet may I ask you > a question - do you have an example of what like this could be these > practises or how it can be simplified? > > > PS: Not sure that it can help but JdbcIO allows to set a query with > “ValueProvider” option which can be helpful to parametrise your transform > with values that are only available during pipeline execution and can be used > for pipeline templates [1]. > > [1] > https://cloud.google.com/dataflow/docs/guides/templates/creating-templates > <https://cloud.google.com/dataflow/docs/guides/templates/creating-templates> > > > On 17 Mar 2021, at 14:06, Thomas Fredriksen(External) > > <[email protected] <mailto:[email protected]>> > > wrote: > > > > Hello everyone, > > > > I was wondering what is considered best-practice when writing SQL > > statements for the JdbcIO connector? > > > > Hand-writing the statements and subsequent preparedStatementSetter causes a > > lot of bloat and is not very manageable. > > > > Thank you/ > > > > Best Regards > > Thomas Li Fredriksen >
