These errors are often seen at the end of a pipeline -- they indicate that
due to the failure the backend has been torn down and the attempts to
report the current status have failed. If you look in the "Stack Traces"
tab in the UI [1] or earlier in the Stackdriver logs, you should
(hopefully) be
I have a batch pipeline that runs well with small inputs but fails with a
larger dataset.
Looking at stackdriver I find a fair number of the following:
Request failed with code 400, will NOT retry:
Thanks.
On Thu, Aug 3, 2017 at 2:11 PM Lukasz Cwik wrote:
> But the windows can still be processed out of order.
>
> On Thu, Aug 3, 2017 at 2:10 PM, Lukasz Cwik wrote:
>
>> Yes.
>>
>> On Thu, Aug 3, 2017 at 2:09 PM, Eric Fang
>>
To my knowledge, autoscaling is dependent on how many messages are
backlogged within Pubsub and independent of the second subscription (the
second subscription is really to compute a better watermark).
On Thu, Aug 3, 2017 at 1:34 PM, wrote:
> Thanks Lukasz that's good to know!
But the windows can still be processed out of order.
On Thu, Aug 3, 2017 at 2:10 PM, Lukasz Cwik wrote:
> Yes.
>
> On Thu, Aug 3, 2017 at 2:09 PM, Eric Fang wrote:
>
>> Thanks Lukasz. If the stream is infinite, I am assuming you mean by
>> window the
Yes.
On Thu, Aug 3, 2017 at 2:09 PM, Eric Fang wrote:
> Thanks Lukasz. If the stream is infinite, I am assuming you mean by window
> the stream into pans and sort the bundle for each trigger to an order?
>
> Eric
>
> On Thu, Aug 3, 2017 at 12:49 PM Lukasz Cwik
Thanks Lukasz. If the stream is infinite, I am assuming you mean by window
the stream into pans and sort the bundle for each trigger to an order?
Eric
On Thu, Aug 3, 2017 at 12:49 PM Lukasz Cwik wrote:
> There is currently no strict ordering which is supported within Apache
>
Thanks Lukasz that's good to know! It sounds like we can halve our PubSub costs
then!
Just to clarify, if I were to remove withTimestampAttribute from a job, this
would cause the watermark to always be up to date (processing time) even if the
job starts to lag behind (build up of
There is currently no strict ordering which is supported within Apache Beam
(timestamp or not) and any ordering which may be occurring is just a side
effect and not guaranteed in any way.
Since the smallest unit of work is a bundle containing 1 element, the only
way to get ordering is to make one
Hi all,
We have a stream of data that's ordered by a timestamp and our use case
requires us to process the data in order with respect to the previous
element. For example, we have a stream of true/false ingested from PubSub
and we want to make sure for each key, a true always follows by a false.
Hi all,
We've been running a few streaming Beam jobs on Dataflow, where each job is
consuming from PubSub via PubSubIO. Each job does something like this:
PubsubIO.readMessagesWithAttributes()
.withIdAttribute("unique_id")
.withTimestampAttribute("timestamp");
My
Hi,
Thanks for trying it out.
I was running the job in local single node setup. I also spawn a HDInsights
cluster in Azure platform just to test the WordCount program. Its the same
result there too, stuck at the Evaluating ParMultiDo step. It runs fine in mvn
compile exec, but when bundled
Hi,
Was anyone able to run Beam application on Spark at all??
I tried all possible options and still no luck. No executors getting assigned
for the job submitted by below command even though explicitly specified,
$ ~/spark/bin/spark-submit --class org.apache.beam.examples.WordCount --master
Hi,
I think this might not be a problem. The reason we have this
DatSink(DiscardingOutputFormat) at the "end" of Flink Batch pipelines is that
Flink Batch will not execute a chain of operations when they're not terminated
by a sink. In Beam, it's just fine to just have a DoFn and no sink after
14 matches
Mail list logo