Re: [Question] Beam Java Dataflow v1 Runner Oracle JDK

2023-04-17 Thread hardip singh
Hi Bruno, Yep it is indeed based on the OpenJDK source code. However it looks to be provided by Oracle (and hence falls under the oracle licence): %docker run -it --entrypoint '/bin/bash' gcr.io/cloud-dataflow/v1beta3/beam-java-streaming:2.46.0 WARNING: The requested image's platform

Re: [Question] Beam Java Dataflow v1 Runner Oracle JDK

2023-04-17 Thread Bruno Volpato via user
Hello Hardip, If you are using Beam 2.46.0, it should be using OpenJDK already (not Oracle's JRE as before). No need for the sources, you can check the images directly from your terminal, if you have Docker installed: $ docker run -it --entrypoint '/bin/bash'

[Question] Beam Java Dataflow v1 Runner Oracle JDK

2023-04-17 Thread hardip singh
Hi, I was hoping some one could shed some light and potentially a solution to a problem I face with usage of the v1 runner. Due to Oracle Java SE licensing changes of older Java versions, I am looking to move to the eclipse (Temurin) OpenJdk runtime, which I can see has been updated in the

Re: Loosing records when using BigQuery IO Connector

2023-04-17 Thread Binh Nguyen Van
Hi, I tested with streaming insert and file load, and they all worked as expected. But looks like storage API is the new way to go so want to test it too. I am using Apache Beam v2.46.0 and running it with Google Dataflow. Thanks -Binh On Mon, Apr 17, 2023 at 9:53 AM Reuven Lax via user

Re: Loosing records when using BigQuery IO Connector

2023-04-17 Thread Reuven Lax via user
What version of Beam are you using? There are no known data-loss bugs in the connector, however if there has been a regression we would like to address it with high priority. On Mon, Apr 17, 2023 at 12:47 AM Binh Nguyen Van wrote: > Hi, > > I have a job that uses BigQuery IO Connector to write

Re: Loosing records when using BigQuery IO Connector

2023-04-17 Thread XQ Hu via user
Does FILE_LOADS ( https://beam.apache.org/releases/javadoc/2.46.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#FILE_LOADS) work for your case? For STORAGE_WRITE_API, it has been actively improved. If the latest SDK still has this issue, I highly recommend you to create a Google

Loosing records when using BigQuery IO Connector

2023-04-17 Thread Binh Nguyen Van
Hi, I have a job that uses BigQuery IO Connector to write to a BigQuery table. When I test it with a small number of records (100) it works as expected but when I tested it with a larger number of records (1), I don’t see all of the records written to the output table but only part of it. It

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-17 Thread Reuven Lax via user
Looking at FlinkPipelineOptions, there is a parallelism option you can set. I believe this sets the default parallelism for all Flink operators. On Sun, Apr 16, 2023 at 7:20 PM Jeff Zhang wrote: > Thanks Holden, this would work for Spark, but Flink doesn't have such kind > of mechanism, so I am