Re: JdbcIO SQL best practice

Alexey Romanenko Thu, 15 Apr 2021 08:36:18 -0700

I don’t think so because this statement [1] is used in this case. 
 
[1] 
https://github.com/apache/beam/blob/97af0775cc19a4997a4b60c6a75d003f8e86cf1f/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcUtil.java#L56


> On 14 Apr 2021, at 14:44, Thomas Fredriksen(External) 
> <thomas.fredrik...@cognite.com> wrote:
> 
> This seems very promising,
> 
> Will the write from PCollectino<Row> handle upserts?
> 
> On Wed, Mar 24, 2021 at 6:56 PM Alexey Romanenko <aromanenko....@gmail.com 
> <mailto:aromanenko....@gmail.com>> wrote:
> Thanks for details.
> 
> If I’m not mistaken, JdbcIO already supports both your suggestions for read 
> and write (at lest, in some way) [1][2]. 
> 
> Some examples from tests:
> - write from PCollection<Row> [3], 
> - read to PCollection<Row> [4], 
> - write from PCollection<POJO> with JavaBeanSchema [5] 
> 
> Is it something that you are looking for?
> 
> [1] https://issues.apache.org/jira/browse/BEAM-6674 
> <https://issues.apache.org/jira/browse/BEAM-6674>
> [2] https://github.com/apache/beam/pull/8725 
> <https://github.com/apache/beam/pull/8725>
> [3] 
> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469
>  
> <https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469>
> [4] 
> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L524
>  
> <https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L524>
> [5] 
> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469
>  
> <https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L469>
> 
> 
>> On 23 Mar 2021, at 08:03, Thomas Fredriksen(External) 
>> <thomas.fredrik...@cognite.com <mailto:thomas.fredrik...@cognite.com>> wrote:
>> 
>> That is a very good question.
>> 
>> Personally, I would prefer that read and write were simplified. I guess 
>> there will always be a need for writing complex queries, but the vast 
>> majority of pipelines will only need to read or write data to or from a 
>> table. As such, having read/write functions that will take an input-class 
>> (BEAN or POJO for example) and simply generate the required write-statement 
>> would be sufficient. Upserts should also be a part of this.
>> 
>> For example:
>> 
>> ```
>> PCollection<MyBean> collection = ...;
>> collection.apply("Write to database", JdbcIO.writeTable(MyBean.class)
>>         .withDataSourceConfiguration(mySourceConfiguration)
>>         .withTableName(myTableName)
>>         .withUpsertOption(UpsertOption.create()
>>                 .withConflictTarget(keyColumn)
>>                 .withDoUpdate());
>> ```
>> This would of course assume that the columns of `myTableName` would match 
>> the members of `MyBean`.
>> 
>> There are of course technical challenges with this:
>> * How to handle situations where the column names do not match the input-type
>> * How to detect columns from the input-type.
>> 
>> As an alternative, schemas may be an option:
>> 
>> ```
>> PCollection<Row> collection = ...;
>> collection.apply("Write to database", JdbcIO.writeRows()
>>         .withSchema(mySchema)
>>         .withDataSourceConfiguration(mySourceConfiguration)
>>         .withTableName(myTableName)
>>         .withUpsertOption(UpsertOption.create()
>>                 .withConflictTarget(keyColumn)
>>                 .withDoUpdate());
>> ```
>> This would allow for greater flexibility, but we lose the type-strong nature 
>> of first suggestion.
>> 
>> I hope this helps.
>> 
>> Best Regards
>> Thomas Li Fredriksen
>> 
>> On Fri, Mar 19, 2021 at 7:17 PM Alexey Romanenko <aromanenko....@gmail.com 
>> <mailto:aromanenko....@gmail.com>> wrote:
>> Hmm, interesting question. Since we don’t have any answers yet may I ask you 
>> a question - do you have an example of what like this could be these 
>> practises or how it can be simplified?
>> 
>> 
>> PS: Not sure that it can help but JdbcIO allows to set a query with 
>> “ValueProvider” option which can be helpful to parametrise your transform 
>> with values that are only available during pipeline execution and can be 
>> used for pipeline templates [1].
>> 
>> [1] 
>> https://cloud.google.com/dataflow/docs/guides/templates/creating-templates 
>> <https://cloud.google.com/dataflow/docs/guides/templates/creating-templates>
>> 
>> > On 17 Mar 2021, at 14:06, Thomas Fredriksen(External) 
>> > <thomas.fredrik...@cognite.com <mailto:thomas.fredrik...@cognite.com>> 
>> > wrote:
>> > 
>> > Hello everyone,
>> > 
>> > I was wondering what is considered best-practice when writing SQL 
>> > statements for the JdbcIO connector?
>> > 
>> > Hand-writing the statements and subsequent preparedStatementSetter causes 
>> > a lot of bloat and is not very manageable.
>> > 
>> > Thank you/
>> > 
>> > Best Regards
>> > Thomas Li Fredriksen
>> 
>

Re: JdbcIO SQL best practice

Reply via email to