Hi Maxi,
I assume this will impact the Exactly Once Semantics that beam provided as
in the KafkaExactlyOnceSink, the processElement method is also annotated
with @RequiresStableInput?
Thanks a lot!
Eleanore
On Tue, Apr 21, 2020 at 12:58 AM Maximilian Michels wrote:
> Hi Stephen,
>
> Thanks
We should always want to shut down sources on final watermark. All incoming
data should be dropped anyhow.
Kenn
On Tue, Apr 21, 2020 at 1:34 PM Luke Cwik wrote:
> +dev
>
> When would we not want --shutdownSourcesOnFinalWatermark=true ?
>
> On Tue, Apr 21, 2020 at 1:22 PM Ismaël Mejía wrote:
>
Thanks for the recommendation, Rion.
I bought a copy yesterday.
On Wed, 22 Apr 2020, at 1:45 PM, Kenneth Knowles wrote:
> I believe Streaming Systems is the most Beam-oriented book available.
>
> Kenn
>
> On Mon, Apr 20, 2020 at 3:07 PM Rion Williams wrote:
>> Hi all,
>>
>> I posed this
I believe Streaming Systems is the most Beam-oriented book available.
Kenn
On Mon, Apr 20, 2020 at 3:07 PM Rion Williams wrote:
> Hi all,
>
> I posed this question over on the Apache Slack Community however didn't
> get much of a response so I thought I'd reach out here. I've been looking
>
On Tue, Apr 21, 2020 at 12:43 PM Piotr Filipiuk
wrote:
> Hi,
>
> I would like to know whether it is possible to run a streaming pipeline
> that reads from (or writes to) Kafka using DirectRunner? If so, what should
> the expansion_service point to:
>
Thank you. It worked with the argument '--shutdownSourcesOnFinalWatermark=true'
I will open PR to update the documentation.
Regards,
Sruthi
On 2020/04/21 20:22:17, Ismaël Mejía wrote:
> You need to instruct the Flink runner to shutdown the the source
> otherwise it will stay waiting.
> You can
+dev
When would we not want --shutdownSourcesOnFinalWatermark=true ?
On Tue, Apr 21, 2020 at 1:22 PM Ismaël Mejía wrote:
> You need to instruct the Flink runner to shutdown the the source
> otherwise it will stay waiting.
> You can this by adding the extra
>
You need to instruct the Flink runner to shutdown the the source
otherwise it will stay waiting.
You can this by adding the extra
argument`--shutdownSourcesOnFinalWatermark=true`
And if that works and you want to open a PR to update our
documentation that would be greatly appreciated.
Regards,
Hello,
I am trying to run nexmark queries using flink runner streaming. Followed
the documentation and used the command
./gradlew :sdks:java:testing:nexmark:run \
-Pnexmark.runner=":runners:flink:1.10" \
-Pnexmark.args="
--runner=FlinkRunner
--suite=SMOKE
Hi,
I would like to know whether it is possible to run a streaming pipeline that
reads from (or writes to) Kafka using DirectRunner? If so, what should the
expansion_service point to:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py#L90?
Also, when using
My apologies, I missed the link:
[1] https://github.com/gbif/pipelines
On Tue, Apr 21, 2020 at 5:58 PM Tim Robertson
wrote:
> Hi Jordan
>
> I don't know if we qualify as a large Beam project but at GBIF.org we
> bring together datasets from 1600+ institutions documenting 1,4B
> observations of
Hi Jordan
I don't know if we qualify as a large Beam project but at GBIF.org we bring
together datasets from 1600+ institutions documenting 1,4B observations of
species (museum data, citizen science, environmental reports etc).
As far as Beam goes though, we aren't using the most advanced
+dev
I don't have a ton of time to dig in to this, but I wanted to say that this
is very cool and just drop a couple pointers (which you may already know
about) like Explaining Outputs in Modern Data Analytics [1] which was
covered by The Morning Paper [2]. This just happens to be something I
Mozilla hosts the code for our data ingestion system publicly on GitHub. A
good chunk of that architecture consists of Beam pipelines running on
Dataflow.
See:
https://github.com/mozilla/gcp-ingestion/tree/master/ingestion-beam
and rendered usage documentation at:
Thank you. I should have realized this flag.
Stay safe
Best,
Eila
On Mon, Apr 20, 2020 at 12:42 PM Luke Cwik wrote:
> It looks like anaconda assumes that the directory doesn't exist before
> installation. I would recommend using a new directory such as
> ["/opt/userowned/anaconda.sh",
Hi Stephen,
Thanks for reporting the issue! David, good catch!
I think we have to resort to only using a single state cell for
buffering on checkpoints, instead of using a new one for every
checkpoint. I was under the assumption that, if the state cell was
cleared, it would not be checkpointed
Hi Stephen,
nice catch and awesome report! ;) This definitely needs a proper fix. I've
created a new JIRA to track the issue and will try to resolve it soon as
this seems critical to me.
https://issues.apache.org/jira/browse/BEAM-9794
Thanks,
D.
On Mon, Apr 20, 2020 at 10:41 PM Stephen Patel
17 matches
Mail list logo