Hi all,
I'm trying to make use of ParquetIO. Based on what's documented in maven
central, I'm including the artifact in "compileOnly" mode (or in maven
parlance, 'provided' scope). I can successfully compile my pipeline, but
when I run it I (intuitively?) am met with a ClassNotFound exception
Hi. Can someone help me with this?
El mié, 19 abr 2023 a las 15:08, Juan Romero () escribió:
> Hi community.
>
> On this occasion I have a doubt regarding how to read a stream from kafka
> and write batches of data with the jdbc connector. The idea is to override
> a specific row if the current
Hi Ning,
I might have missed that in the discussion, but we talk about batch
execution, am I right? In streaming, all operators (PTransforms) of a
Pipeline are run in the same slots, thus the downsides are limited. You
can enforce streaming mode using --streaming command-line argument. But
Thank You for the reply and a hint.
1. Yes did try with Calcite `ROW` too - `java.lang.NoSuchFieldException:
head (state=,code=0)` but on the transformation side `SELECT * FROM
etl_raw LIMIT 1`. Maybe I need to directly refer to a field that I need
instead of using `*`? Do You know from top
set plannerName doesn't actually do anything on the SQL shell at query
parse time, it will still use the calcite parser. Have you tried calcite
SQL?
Support for struts is somewhat limited. I know there are bugs around nested
structs and structs with single values.
Andrew
On Thu, Apr 20, 2023 at
Hi Jan,
The approach works when your pipeline doesn't have too many operators. And
the operator that needs the highest parallelism can only use at most
#total_task_slots / #operators resources available in the cluster.
Another downside is wasted resources for other smaller operators who cannot
Hi,
I have a question regarding usage of Zeta with SQL extensions in SQL
shell. I try to:
```
SET runner = DirectRunner;
SET tempLocation = `/tmp/test/`;
SET streaming=`True`;
SET plannerName =
`org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`;
CREATE EXTERNAL TABLE
For more info on splitable DoFn, there is a good resource on the beam
blog[1]. Alexey has also shown a great alternative!
[1] https://beam.apache.org/blog/splittable-do-fn/
On Thu, Apr 20, 2023 at 9:08 AM Alexey Romanenko
wrote:
> Some Java IO-connectors implement a class something like
Some Java IO-connectors implement a class something like "class ReadAll extends
PTransform, PCollection>” where “Read” is
supposed to be configured dynamically. As a simple example, take a look on
“SolrIO” [1]
So, to support what you are looking for, “ReadAll”-pattern should be
implemented
Hi,
I have a question regarding usage of Zeta with SQL extensions in SQL
shell. I try to:
```
SET runner = DirectRunner;
SET tempLocation = `/tmp/test/`;
SET streaming=`True`;
SET plannerName =
`org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`;
CREATE EXTERNAL TABLE
I’m not able to find any implementation of ‘SplitableDoFn”. All reference I
can find are of “Splitable DoFn”, so could you point me in the right version of
the Apache Beam SDK that would have this? Thanks, ~Sean
From: Evan Galpin
Date: Wednesday, April 19, 2023 at 4:46 PM
To:
Hi,
this topic was discussed many years ago and the conclusion there was
that setting the parallelism of individual operators via
FlinkPipelineOptions (or ResourceHints) is be possible, but would be
somewhat cumbersome. Although I understand that it "feels" weird to have
high parallelism for
12 matches
Mail list logo