this in generic form).
Best
Wiśniowski Piotr
On 1.09.2023 15:56, Byron Ellis via user wrote:
Depends on why you're using a fan-out approach in the first place. You
might actually be better off doing all the work at the same time.
On Fri, Sep 1, 2023 at 6:43 AM Ruben Vargas
wrote:
Ohh I
like to take a look at them?
Best
Wiśniowski Piotr
On 28.09.2023 19:31, Balogh, György wrote:
Sorry I was not specific enough. I ment using the SqlTransform
registerUdf and registerUdaf. I use a lot of SQL in my pipeline and I
would prefer using SQL UDFs in many cases over writing beam
Hi,
Any typical patterns for deployment of stateful streaming pipelines from
Beam? Targeting Dataflow and Python SDK with significant usage of
stateful processing with long windows (typically days long).
From our current practice of maintaining pipelines we did identify 3
typical scenarios:
ons of pyarrow?
2. Is there any planned effort on updating this to `14.0.1`? Is it
possible to push the update to `2.52.0` beam release? I know the beam
release is almost there.
Best
Wiśniowski Piotr
.
If You remember that It is possible to read or write to partitioned
Parquet files as just `PTransform` that's great! I probably must have
made some minor mistake in my trials. But eager to learn what was the
mistake.
Best regards
Wiśniowski Piotr
I did find solution like this one:
https
.
Best
Wiśniowski Piotr
On 17.02.2023 02:00, Lydian wrote:
I want to make a simple Beam pipeline which will store the events from
kafka to S3 in parquet format every minute.
Here's a simplified version of my pipeline:
|def add_timestamp(event: Any) -> Any: from datetime import datetime
f
Hi,
I have a question regarding usage of Zeta with SQL extensions in SQL
shell. I try to:
```
SET runner = DirectRunner;
SET tempLocation = `/tmp/test/`;
SET streaming=`True`;
SET plannerName =
`org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`;
CREATE EXTERNAL TABLE
Hi,
I have a question regarding usage of Zeta with SQL extensions in SQL
shell. I try to:
```
SET runner = DirectRunner;
SET tempLocation = `/tmp/test/`;
SET streaming=`True`;
SET plannerName =
`org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner`;
CREATE EXTERNAL TABLE
SQL?
Support for struts is somewhat limited. I know there are bugs around
nested structs and structs with single values.
Andrew
On Thu, Apr 20, 2023 at 9:26 AM Wiśniowski Piotr
wrote:
Hi,
I have a question regarding usage of Zeta with SQL extensions in SQL
shell. I
that maven suggest this.
Best
Wiśniowski Piotr
On 21.04.2023 08:30, Moritz Mack wrote:
Hi Evan,
Not sure why maven suggests using “compileOnly”.
That’s certainly wrong, make sure to use “implementation” in your case.
Cheers, Moritz
On 21.04.23, 01:52, "Evan Galpin" wrote:
Hi all
://stackoverflow.com/questions/1109061/insert-on-duplicate-update-in-postgresql/8702291#8702291
4. some mix of p.2 and p.3
Hopefully this helps You in breaking the problem.
Best regards
Wiśniowski Piotr
On 21.04.2023 01:34, Juan Romero wrote:
Hi. Can someone help me with this?
El mié, 19 abr 2023
ogle.com/dataflow/docs/reference/sql/streaming-extensions)
that is present also in Calcite
(https://calcite.apache.org/docs/reference.html#tumble) but Beam SQL
does not parse it. Any idea what is the roadmap for this functionalities?
Really appreciate quick reply.
Best regards
Wiśniowski Piotr
depending on my team
decision - but would like to get like idea how hard this task is.
Will create tickets for the rest of issues when I will have some spare time.
Best regards
Wiśniowski Piotr
On 22.05.2023 18:28, Alexey Romanenko wrote:
Hi Piotr,
Thanks for details! I cross-post this to dev
.
Kenn
On Fri, May 26, 2023 at 11:06 AM Wiśniowski Piotr
wrote:
Hi Alexey,
Thank You for reference to that discussion I do actually have
pretty similar thoughts on what Beam SQL needs.
Update from my side:
Actually did find a workaround for issue with windowing function
ode to tell beam when to emit elements
for the aggregated keys.
Hope this helps. If not please provide a bit more details what exactly
You are trying to do so that I can try to help.
Also You could try to get more familiar with the abstraction with the
help of https://tour.beam.apache.org/.
pe present in the schema
- 1. `CROSS JOIN` between two unrelated tables is not supported - even
if one is a static number table
- 2. ROW construction not supported. It is not possible to nest data
Below queries tat I use to testing this scenarios.
Thank You for looking at this topics!
Best
Wiśnio
it seems that pipeline state on
DataFlow depends somehow on the order in which steps are added to
pipeline since some recent versions (as I do recall this was working
correctly ~2.50?). Anyone knows if this is intended? If yes would like
to know some explanation.
Best regards
Wiśniowski
Hi Valentyn,
Thank You for information and details. All make sense! I think we can
wait for 2.53.0 release and meantime apply hotfix.
Best
Wiśniowski Piotr
On 10.11.2023 20:27, Valentyn Tymofieiev via user wrote:
From https://pypi.org/project/pyarrow-hotfix/ :
pyarrow_hotfix must
GROUP BY
TUMBLE(cte.ts, INTERVAL '1'MINUTE)
;
```
Maybe it would be useful for you. Note that I am not up to date with
recent versions of Beam SQL, but I will need to catch up (the syntax for
defining window on table is cool).
Best
Wiśniowski Piotr
On 4.03.2024 05:27, Jaehyeon Kim wrote
for every `PCollection`. What else should I look into, which could
result in such behavior?
Best
Wiśniowski Piotr
Hi Xinmin,
After quick look at the connection in java it seems that the csv format
was an explicit decision. As this is entirely transparent to user I am
just curious why would You like to use parquet instead? What do you want
to achieve?
Regards,
Wiśniowski Piotr
On 5.03.2024 05:47
me few days to grasp how
to handle this with multiple input streams, but I do plan to create a
post with findings.
I think I had similar days long investigation few months ago. Let me
know if this is helpful.
Best
Wiśniowski Piotr
On 9.03.2024 03:29, Puertos tavares, Jose J (Canada) via
processors, shutdown logic can overaggressively hold the lock, blocking
acceptance of new work` in Beam release docs as known issue. What is the
status of this? Can this potentially be related?
Really appreciate any help, clues or hints how to debug this issue.
Best regards
Wiśniowski Piotr
confirmation.
BTW great job on the investigation of bug that you mentioned.
Impressive. Seems like a nasty one.
Best,
Wiśniowski Piotr
On 24.04.2024 00:31, Valentyn Tymofieiev via user wrote:
You might be running into https://github.com/apache/beam/issues/30867.
Among the error messages you
and drain old pipelines.
Note all above is just hypothesis, but hopefully it might be helpful.
Best
Wiśniowski Piotr
On 16.04.2024 05:15, Juan Romero wrote:
The deployment in the job is made by terraform. I verified and seems
that terraform do it incorrectly under the hood because it stop the
cu
of your runner) - shuffling would be required when creating
artificial random-looking keys.
Note that above is Python, but I do bet there is Java counterpart (or at
least easy to implement).
Best
Wiśniowski Piotr
On 15.04.2024 19:14, Reuven Lax via user wrote:
There are various strategies
Hi,
Getting back with update. After updating `grpcio` the issues are gone.
Thank you for the solution and investigation. Feels like I own you a
beer :)
Best
Wiśniowski Piotr
On 24.04.2024 22:11, Valentyn Tymofieiev wrote:
On Wed, Apr 24, 2024 at 12:40 PM Wiśniowski Piotr
wrote
27 matches
Mail list logo