Jan, I do believe that BEAM-2535 is related since the input time holds the
input watermark and will allow people to set timers which will fire in the
order that they want. This would allow users to say fire at X but I will
only create a new timer at X+Y which would allow the input watermark to
I'm confused as to why it is valid to advance the watermark to T3 in the
original scenario.
T1 and T2 should be treated as inputs to the function and hold the input
watermark hence T1 should fire and if it doesn't produce any new timers
before T2, then T2 should fire since the watermark will now
Makes sense. At least for accumulating mode, maintaining pane ordering
cross stages will be very useful but it is indeed difficult to do so.
Now I can see why trigger at sinks might be a better approach.
-Rui
On Thu, Jun 27, 2019 at 9:35 AM Reuven Lax wrote:
>
>
> On Thu, Jun 27, 2019 at
Thanks added few comments.
If I understood correctly, you basically assign elements with keys to
different buckets which are written to unique files and merge files for the
same key while reading ?
Some of my concerns are.
(1) Seems like you rely on an in-memory sorting of buckets. Will this
Something were eating the disk. Disconnected the worker so jobs could be
allocated to other nodes. Will look deeper.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 485G 485G 96K 100% /
On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou wrote:
> I'm on it.
>
> On Thu, Jun 27, 2019
Thanks. I responded to comments in the doc. More inline.
On Thu, Jun 27, 2019 at 2:44 PM Chamikara Jayalath
wrote:
> Thanks added few comments.
>
> If I understood correctly, you basically assign elements with keys to
> different buckets which are written to unique files and merge files for the
Hi Reza,
cool, I have put together a PR [1], which is still not completely ready.
There are least still missing some tests - probably @ValidatesRunner and
then fixing runners that won't pass that. It also misses few features
described in the design doc, but that could be probably fixed later
Ping again. Any chance someone takes a look to get this thing going? It's
just a design doc and basic metadata/IO impl. We're not talking about
actual source/sink code yet (already done but saved for future PRs).
On Fri, Jun 21, 2019 at 1:38 PM Ahmet Altay wrote:
> Thank you Claire, this looks
On Thu, Jun 27, 2019 at 3:32 AM Robert Bradshaw wrote:
> On Thu, Jun 27, 2019 at 1:52 AM Rui Wang wrote:
> >>
> >>
> >> AFAIK all streaming runners today practically do provide these panes
> in order;
> >
> > Does it refer to "the stage immediately after GBK itself processes fired
> panes in
Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648
Can someone investigate what's going on?
smime.p7s
Description: S/MIME Cryptographic Signature
Cham has a point in the fact that we can change writes in a
‘backwards’ compatible way if needed by providing a new Write
transform, of course the ideal is that we do not need to do this to
ease maintainability, but is a good point against (2) and (3). (1) is
a specific case of (2) so probably
I believe that timers correspond to watermark holds, which hold up the
output watermark, not the input watermark.
On Thu, Jun 27, 2019 at 11:21 PM Lukasz Cwik wrote:
> I'm confused as to why it is valid to advance the watermark to T3 in the
> original scenario.
>
> T1 and T2 should be treated
The problem was because of the large quantity of stale docker images
generated by the Python portable tests and HDFS IT.
Dumping the docker disk usage gives me:
TYPETOTAL ACTIVE SIZE
RECLAIMABLE
*Images 1039356
Thanks Ismael for the feedback on the doc. If there isn't any additional
feedback, I will start a process vote on the release procedure of vendored
artifacts on Tuesday.
On Tue, Jun 25, 2019 at 10:24 AM Lukasz Cwik wrote:
> Ismael mentioned[1] that there is confusion about how to release and
>
It would be possible to have "timer watermark", between input and output
watermark, so that input watermark >= timer watermark >= output
watermark, but it turns out, that doing so implies that we fire timers
only for single instant (because until the timer is fired and processed,
the "timer
I think we thought about this approach but decided to get rid of the map
representation wherever we can while still supporting setting of the
options by name.
One of the lesser important downsides of keeping the map around is that we
will need to do `fromArgs` at least twice.
Another downside is
The watermark holds (which is how the timer holds up the watermark today,
as there is no timer watermark) is per key. Usually the input watermark
making a "hop" is not a problem, in fact it's the normal state of affairs.
On Fri, Jun 28, 2019 at 1:08 AM Lukasz Cwik wrote:
> Thanks Reuven and
Hi Lukasz,
that was my initial thought, but it turns out, that doing so might have
performance issues. And it is only a little of a philosophical question,
if - when watermark moves from one time to another - you assume time to
move "smoothly" (which suggests firing timers for single instant
At least the implementation in DirectRunner fires timers according to
input watemark. Holding the timer up to output watermark causes
deadlocks, because timers fired at time T might clear watermark hold for
the same time.
On 6/27/19 11:55 PM, Reuven Lax wrote:
I believe that timers correspond
Earlier it was said that performance was poor if we moved to a model where
we prevented multiple timer firings. Since timer firings are per key, can
you provide details of what use case has multiple user timer firings per
key?
On Thu, Jun 27, 2019 at 4:34 PM Reuven Lax wrote:
> The watermark
maybe a cron job on jenkins node that does docker prune every day?
On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka wrote:
> This highlights the race condition caused by using single docker registry
> on a machine.
> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
> after
Hi all,
moving the discussion to the dev list:
https://github.com/apache/beam/pull/8919. I think that Perfkit Benchmarker
should be removed from all our tests.
Problems that we face currently:
1. Changes to Gradle tasks/build configuration in the Beam codebase have
to be reflected in
seems like a google issue:
https://status.cloud.google.com/
chaim
On Thu, Jun 27, 2019 at 10:23 AM Tim Robertson
wrote:
>
> Hi Chaim,
>
> To help you we'd need a little more detail I think - what environment,
> runner, how you launch your jobs etc.
>
> My first impression is that is sounds
Hi guys,
FYI, while I'm working on the combine translation for the new spark runner poc,
I saw something that do not seem right
in the current runner: https://issues.apache.org/jira/browse/BEAM-7647
Best,
Etienne
Hi Etienne,
I saw that too while working on solving [1]. It seems a little weird and
I was a little tempted to changed it to something roughly equivalent to
Combine.perKey with single key. But, actually the Combine.globally
should be rather small, right? There will be single value for each
since the night all my jobs that i run are stuck in not started, and ideas why?
chaim
--
Loans are funded by
FinWise Bank, a Utah-chartered bank located in Sandy,
Utah, member FDIC, Equal
Opportunity Lender. Merchant Cash Advances are
made by Behalf. For more
information on ECOA, click here
Hi Chaim,
To help you we'd need a little more detail I think - what environment,
runner, how you launch your jobs etc.
My first impression is that is sounds more like an environment related
thing rather than a Beam codebase issue. If it is a DataFlow environment I
expect you might need to
27 matches
Mail list logo