I've tested the same window, trigger, allowed_lateness nothing seems to
work. I think the main issue is in the writerimpl which I linked earlier.
https://github.com/apache/beam/blob/v2.41.0/sdks/python/apache_beam/io/iobase.py#L1033
According
to the doc:
> Currently only batch workflows support
Hi Alexey,
I am just learning Beam and doing POC that requires fetching stream data
from PubSub and partitioning it on gs as parquet files with constant
window. The thing is I have additional requirement to use ONLY SQL.
I did not manage to do it. My solutions either worked indefinitely or
Piotr,
> On 17 Feb 2023, at 09:48, Wiśniowski Piotr
> wrote:
> Does this mean that Parquet IO does not support partitioning, and we need to
> do some workarounds? Like explicitly mapping each window to a separate
> Parquet file?
>
Could you elaborate a bit more on this? IIRC, we used to
For me this use-case worked with the following window definition, which was
a bit of try-and-fail, and I can't claim I got 100% understanding of
windowing logic.
Here's my java code for Kinesis -> Parquet files which worked:
Hi,
Sounds like exact problem that I have few emails before -
https://lists.apache.org/thread/q929lbwp8ylchbn8ngypfqlbvrwpfzph
Does this mean that Parquet IO does not support partitioning, and we
need to do some workarounds? Like explicitly mapping each window to a
separate Parquet file?
I want to make a simple Beam pipeline which will store the events from
kafka to S3 in parquet format every minute.
Here's a simplified version of my pipeline:
def add_timestamp(event: Any) -> Any:
from datetime import datetime
from apache_beam import window
return