Re: published containers overwrite locally built containers

2019-11-01 Thread Thomas Weise
More here: https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E On Fri, Nov 1, 2019 at 10:56 AM Chamikara Jayalath wrote: > I think it makes sense to override published docker images with locally > built versions when testing

Re: Python Beam pipelines on Flink on Kubernetes

2019-11-01 Thread Thomas Weise
That's a good idea. Probably best to add an example in: https://github.com/lyft/flinkk8soperator Do you want to add an issue? (It will have to wait for 2.18 release though.) On Fri, Nov 1, 2019 at 11:37 AM Chad Dombrova wrote: > Hi Thomas, > Do you have an example Dockerfile demonstrating

***UNCHECKED*** Re: published containers overwrite locally built containers

2019-11-01 Thread Kyle Weaver
For additional context, this was discussed weeks ago on this list: https://lists.apache.org/thread.html/932fe0bc838b92e80475b2bf862e6cec34fbd6ac0d4f3c9de5ac25e1@%3Cdev.beam.apache.org%3E On Fri, Nov 1, 2019 at 10:56 AM Chamikara Jayalath wrote: > I think it makes sense to override published

***UNCHECKED*** Re: published containers overwrite locally built containers

2019-11-01 Thread Heejong Lee
Since 'docker run' automatically pulls when the image doesn't exist locally, I think it's safe to remove explicit 'docker pull' before 'docker run'. Without 'docker pull', we won't update the local image with the remote image (for the same tag) but it shouldn't be a problem in prod that the unique

Re: aggregating over triggered results

2019-11-01 Thread Robert Bradshaw
On Thu, Oct 31, 2019 at 8:48 PM Aaron Dixon wrote: > > First of all thank you for taking the time on this very clear and helpful > message. Much appreciated. > > >I suppose one could avoid doing any pre-aggregation, and emit all of > the events (with reified timestamp) in 60/30-day windows, then

Re: Python SDK timestamp precision

2019-11-01 Thread Robert Bradshaw
On Fri, Nov 1, 2019 at 2:17 AM Jan Lukavský wrote: > > > Yes, this is the "minus epsilon" idea, but assigning this as a bit on > the WindowedValue rather than on the Timestamp itself. This means that > pulling the timestamp out then re-assigning it would be lossy. (As a > basic example, imaging

Re: Python Beam pipelines on Flink on Kubernetes

2019-11-01 Thread Chad Dombrova
Hi Thomas, Do you have an example Dockerfile demonstrating best practices for building an image that contains both Flink and Beam SDK dependencies? That would be useful. -chad On Fri, Nov 1, 2019 at 10:18 AM Thomas Weise wrote: > For folks looking to run Beam on Flink on k8s, see update in

Re: Proposal: Dynamic timer support (BEAM-6857)

2019-11-01 Thread Reuven Lax
Hi Jan, Your proposal has merit, but I think using the TimerFamily specification is more consistent with the existing API. I think that a StateFamily can also have domains just like timers. Luke's suggestion for the proto changes sound good. Reuven On Tue, Oct 29, 2019 at 2:43 AM Jan Lukavský

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-11-01 Thread Reuven Lax
FYI Dataflow is working on adding support for TestStream. The fact that these tests don't expose such problems on Dataflow is evidence that TestStream support is needed. Reuven On Fri, Nov 1, 2019 at 10:21 AM Kenneth Knowles wrote: > Indeed. Thanks for looking through all the runners support

Re: published containers overwrite locally built containers

2019-11-01 Thread Chamikara Jayalath
I think it makes sense to override published docker images with locally built versions when testing HEAD. Thanks, Cham On Thu, Oct 31, 2019 at 6:31 PM Heejong Lee wrote: > Hi, happy halloween! > > I'm looking into failing cross language post commit tests: >

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-11-01 Thread Kenneth Knowles
Indeed. Thanks for looking through all the runners support for this. I have reproduced it and filed https://issues.apache.org/jira/browse/BEAM-8543. The Dataflow integration tests are slow and expensive so I don't want to add a new test suite right now. We have internal coverage of this, just at

Re: Python Beam pipelines on Flink on Kubernetes

2019-11-01 Thread Thomas Weise
For folks looking to run Beam on Flink on k8s, see update in [1] I also updated [2] TLDR: at this time best option to run portable pipelines on k8s is to create container images that have both Flink and the SDK dependencies. I'm curious how much interest there is to use the official SDK

Re: Python SDK timestamp precision

2019-11-01 Thread Jan Lukavský
> Yes, this is the "minus epsilon" idea, but assigning this as a bit on the WindowedValue rather than on the Timestamp itself. This means that pulling the timestamp out then re-assigning it would be lossy. (As a basic example, imaging the batching DoFn that batches up elements (with their

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-11-01 Thread Jan Lukavský
Okay, that makes sense. I'm not sure how to fix this, though. Can I suppose that someone from Dataflow team will take care of that? On 11/1/19 12:16 AM, Kenneth Knowles wrote: It is because Dataflow does not support TestStream, so one test is disabled, and because the other test has only

Re: RabbitMqIO issues and open PRs

2019-11-01 Thread Jean-Baptiste Onofré
Hi, I just provided feedback in the PRs. Let me know if you want to chat about some initial implementation (as I'm the original author of the IO, I remember some discussion in the past ;) ). Regards JB On 31/10/2019 21:38, Daniel Robert wrote: > I'm pretty new to the Beam ecosystem, so