[java] Trouble with gradle and using ParquetIO

2023-04-20 Thread Evan Galpin
Hi all, I'm trying to make use of ParquetIO. Based on what's documented in maven central, I'm including the artifact in "compileOnly" mode (or in maven parlance, 'provided' scope). I can successfully compile my pipeline, but when I run it I (intuitively?) am met with a ClassNotFound exception

Re: Can I batch data when i use JDBC write operation?

2023-04-20 Thread Juan Romero
Hi. Can someone help me with this? El mié, 19 abr 2023 a las 15:08, Juan Romero () escribió: > Hi community. > > On this occasion I have a doubt regarding how to read a stream from kafka > and write batches of data with the jdbc connector. The idea is to override > a specific row if the current

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-20 Thread Jan Lukavský
Hi Ning, I might have missed that in the discussion, but we talk about batch execution, am I right? In streaming, all operators (PTransforms) of a Pipeline are run in the same slots, thus the downsides are limited. You can enforce streaming mode using --streaming command-line argument. But

Re: Beam shell sql with zeta

2023-04-20 Thread Wiśniowski Piotr
Thank You for the reply and a hint. 1. Yes did try with Calcite `ROW` too - `java.lang.NoSuchFieldException: head (state=,code=0)` but on the transformation side `SELECT * FROM etl_raw LIMIT 1`. Maybe I need to directly refer to a field that I need instead of using `*`? Do You know from top

Re: Beam shell sql with zeta

2023-04-20 Thread Andrew Pilloud via user
set plannerName doesn't actually do anything on the SQL shell at query parse time, it will still use the calcite parser. Have you tried calcite SQL? Support for struts is somewhat limited. I know there are bugs around nested structs and structs with single values. Andrew On Thu, Apr 20, 2023 at

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-20 Thread Ning Kang via user
Hi Jan, The approach works when your pipeline doesn't have too many operators. And the operator that needs the highest parallelism can only use at most #total_task_slots / #operators resources available in the cluster. Another downside is wasted resources for other smaller operators who cannot

Beam shell sql with zeta

2023-04-20 Thread Wiśniowski Piotr
Hi, I have a question regarding usage of Zeta with SQL extensions in SQL shell. I try to: ``` SET runner = DirectRunner; SET tempLocation = `/tmp/test/`; SET streaming=`True`; SET plannerName = `org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`; CREATE EXTERNAL TABLE

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-20 Thread Evan Galpin
For more info on splitable DoFn, there is a good resource on the beam blog[1]. Alexey has also shown a great alternative! [1] https://beam.apache.org/blog/splittable-do-fn/ On Thu, Apr 20, 2023 at 9:08 AM Alexey Romanenko wrote: > Some Java IO-connectors implement a class something like

Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-20 Thread Alexey Romanenko
Some Java IO-connectors implement a class something like "class ReadAll extends PTransform, PCollection>” where “Read” is supposed to be configured dynamically. As a simple example, take a look on “SolrIO” [1] So, to support what you are looking for, “ReadAll”-pattern should be implemented

Beam shell sql with zeta

2023-04-20 Thread Wiśniowski Piotr
Hi, I have a question regarding usage of Zeta with SQL extensions in SQL shell. I try to: ``` SET runner = DirectRunner; SET tempLocation = `/tmp/test/`; SET streaming=`True`; SET plannerName = `org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`; CREATE EXTERNAL TABLE

Re: [EXTERNAL] Re: Q: Apache Beam IOElasticsearchIO.read() method (Java), which expects a PBegin input and a means to handle a collection of queries

2023-04-20 Thread Murphy, Sean P. via user
I’m not able to find any implementation of ‘SplitableDoFn”. All reference I can find are of “Splitable DoFn”, so could you point me in the right version of the Apache Beam SDK that would have this? Thanks, ~Sean From: Evan Galpin Date: Wednesday, April 19, 2023 at 4:46 PM To:

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-20 Thread Jan Lukavský
Hi, this topic was discussed many years ago and the conclusion there was that setting the parallelism of individual operators via FlinkPipelineOptions (or ResourceHints) is be possible, but would be somewhat cumbersome. Although I understand that it "feels" weird to have high parallelism for