Re: DirectRunner timers are not strictly time ordered

2019-06-27 Thread Lukasz Cwik
Jan, I do believe that BEAM-2535 is related since the input time holds the input watermark and will allow people to set timers which will fire in the order that they want. This would allow users to say fire at X but I will only create a new timer at X+Y which would allow the input watermark to

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Lukasz Cwik
I'm confused as to why it is valid to advance the watermark to T3 in the original scenario. T1 and T2 should be treated as inputs to the function and hold the input watermark hence T1 should fire and if it doesn't produce any new timers before T2, then T2 should fire since the watermark will now

Re: Accumulating mode implies that panes are processed in order?

2019-06-27 Thread Rui Wang
Makes sense. At least for accumulating mode, maintaining pane ordering cross stages will be very useful but it is indeed difficult to do so. Now I can see why trigger at sinks might be a better approach. -Rui On Thu, Jun 27, 2019 at 9:35 AM Reuven Lax wrote: > > > On Thu, Jun 27, 2019 at

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-06-27 Thread Chamikara Jayalath
Thanks added few comments. If I understood correctly, you basically assign elements with keys to different buckets which are written to unique files and merge files for the same key while reading ? Some of my concerns are. (1) Seems like you rely on an in-memory sorting of buckets. Will this

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yifan Zou
Something were eating the disk. Disconnected the worker so jobs could be allocated to other nodes. Will look deeper. Filesystem Size Used Avail Use% Mounted on /dev/sda1 485G 485G 96K 100% / On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou wrote: > I'm on it. > > On Thu, Jun 27, 2019

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-06-27 Thread Neville Li
Thanks. I responded to comments in the doc. More inline. On Thu, Jun 27, 2019 at 2:44 PM Chamikara Jayalath wrote: > Thanks added few comments. > > If I understood correctly, you basically assign elements with keys to > different buckets which are written to unique files and merge files for the

Re: Looping timer blog

2019-06-27 Thread Jan Lukavský
Hi Reza, cool, I have put together a PR [1], which is still not completely ready. There are least still missing some tests - probably @ValidatesRunner and then fixing runners that won't pass that. It also misses few features described in the design doc, but that could be probably fixed later

Re: Discussion/Proposal: support Sort Merge Bucket joins in Beam

2019-06-27 Thread Neville Li
Ping again. Any chance someone takes a look to get this thing going? It's just a design doc and basic metadata/IO impl. We're not talking about actual source/sink code yet (already done but saved for future PRs). On Fri, Jun 21, 2019 at 1:38 PM Ahmet Altay wrote: > Thank you Claire, this looks

Re: Accumulating mode implies that panes are processed in order?

2019-06-27 Thread Reuven Lax
On Thu, Jun 27, 2019 at 3:32 AM Robert Bradshaw wrote: > On Thu, Jun 27, 2019 at 1:52 AM Rui Wang wrote: > >> > >> > >> AFAIK all streaming runners today practically do provide these panes > in order; > > > > Does it refer to "the stage immediately after GBK itself processes fired > panes in

apache-beam-jenkins-15 out of disk

2019-06-27 Thread Udi Meiri
Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648 Can someone investigate what's going on? smime.p7s Description: S/MIME Cryptographic Signature

Re: Return types of Write transforms (aka best way to signal)

2019-06-27 Thread Ismaël Mejía
Cham has a point in the fact that we can change writes in a ‘backwards’ compatible way if needed by providing a new Write transform, of course the ideal is that we do not need to do this to ease maintainability, but is a good point against (2) and (3). (1) is a specific case of (2) so probably

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Reuven Lax
I believe that timers correspond to watermark holds, which hold up the output watermark, not the input watermark. On Thu, Jun 27, 2019 at 11:21 PM Lukasz Cwik wrote: > I'm confused as to why it is valid to advance the watermark to T3 in the > original scenario. > > T1 and T2 should be treated

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yifan Zou
The problem was because of the large quantity of stale docker images generated by the Python portable tests and HDFS IT. Dumping the docker disk usage gives me: TYPETOTAL ACTIVE SIZE RECLAIMABLE *Images 1039356

Re: [DISCUSS] Releasing Vendored Artifacts

2019-06-27 Thread Lukasz Cwik
Thanks Ismael for the feedback on the doc. If there isn't any additional feedback, I will start a process vote on the release procedure of vendored artifacts on Tuesday. On Tue, Jun 25, 2019 at 10:24 AM Lukasz Cwik wrote: > Ismael mentioned[1] that there is confusion about how to release and >

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Jan Lukavský
It would be possible to have "timer watermark", between input and output watermark, so that input watermark >= timer watermark >= output watermark, but it turns out, that doing so implies that we fire timers only for single instant (because until the timer is fired and processed, the "timer

Re: Change of Behavior - JDBC Set Command

2019-06-27 Thread Anton Kedin
I think we thought about this approach but decided to get rid of the map representation wherever we can while still supporting setting of the options by name. One of the lesser important downsides of keeping the map around is that we will need to do `fromArgs` at least twice. Another downside is

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Reuven Lax
The watermark holds (which is how the timer holds up the watermark today, as there is no timer watermark) is per key. Usually the input watermark making a "hop" is not a problem, in fact it's the normal state of affairs. On Fri, Jun 28, 2019 at 1:08 AM Lukasz Cwik wrote: > Thanks Reuven and

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Jan Lukavský
Hi Lukasz, that was my initial thought, but it turns out, that doing so might have performance issues. And it is only a little of a philosophical question, if - when watermark moves from one time to another - you assume time to move "smoothly" (which suggests firing timers for single instant

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Jan Lukavský
At least the implementation in DirectRunner fires timers according to input watemark. Holding the timer up to output watermark causes deadlocks, because timers fired at time T might clear watermark hold for the same time. On 6/27/19 11:55 PM, Reuven Lax wrote: I believe that timers correspond

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Lukasz Cwik
Earlier it was said that performance was poor if we moved to a model where we prevented multiple timer firings. Since timer firings are per key, can you provide details of what use case has multiple user timer firings per key? On Thu, Jun 27, 2019 at 4:34 PM Reuven Lax wrote: > The watermark

Re: apache-beam-jenkins-15 out of disk

2019-06-27 Thread Yichi Zhang
maybe a cron job on jenkins node that does docker prune every day? On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka wrote: > This highlights the race condition caused by using single docker registry > on a machine. > If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one > after

Stop using Perfkit Benchmarker tool in all tests?

2019-06-27 Thread Łukasz Gajowy
Hi all, moving the discussion to the dev list: https://github.com/apache/beam/pull/8919. I think that Perfkit Benchmarker should be removed from all our tests. Problems that we face currently: 1. Changes to Gradle tasks/build configuration in the Beam codebase have to be reflected in

Re: jobs not started

2019-06-27 Thread Chaim Turkel
seems like a google issue: https://status.cloud.google.com/ chaim On Thu, Jun 27, 2019 at 10:23 AM Tim Robertson wrote: > > Hi Chaim, > > To help you we'd need a little more detail I think - what environment, > runner, how you launch your jobs etc. > > My first impression is that is sounds

[Current spark runner] Combine globally translation is risky and not very performant

2019-06-27 Thread Etienne Chauchot
Hi guys, FYI, while I'm working on the combine translation for the new spark runner poc, I saw something that do not seem right in the current runner: https://issues.apache.org/jira/browse/BEAM-7647 Best, Etienne

Re: [Current spark runner] Combine globally translation is risky and not very performant

2019-06-27 Thread Jan Lukavský
Hi Etienne, I saw that too while working on solving [1]. It seems a little weird and I was a little tempted to changed it to something roughly equivalent to Combine.perKey with single key. But, actually the Combine.globally should be rather small, right? There will be single value for each

jobs not started

2019-06-27 Thread Chaim Turkel
since the night all my jobs that i run are stuck in not started, and ideas why? chaim -- Loans are funded by FinWise Bank, a Utah-chartered bank located in Sandy, Utah, member FDIC, Equal Opportunity Lender. Merchant Cash Advances are made by Behalf. For more information on ECOA, click here

Re: jobs not started

2019-06-27 Thread Tim Robertson
Hi Chaim, To help you we'd need a little more detail I think - what environment, runner, how you launch your jobs etc. My first impression is that is sounds more like an environment related thing rather than a Beam codebase issue. If it is a DataFlow environment I expect you might need to