Contributor Permission for Beam Jira tickets

2019-11-25 Thread David Song
Hi, This is David from DataPLS EngProd team (wintermelons@). I am working on integration tests with some Beam runners over Dataflow. Can someone add me as a contributor for the Beam's Jira tracker? I have an open bug, and would like to assign myself to it. My Jira username is wintermelons, and

Re: real real-time beam

2019-11-25 Thread Kenneth Knowles
Hi Aaron, Another insightful observation. Whenever an aggregation (GBK / Combine per key) has a trigger firing, there is a per-key sequence number attached. It is included in metadata known as "PaneInfo" [1]. The value of PaneInfo.getIndex() is colloquially referred to as the "pane index". You

Re: cython test instability

2019-11-25 Thread Chad Dombrova
Actually, it looks like I'm getting the same error on multiple PRs: https://scans.gradle.com/s/ihfmrxr7evslw On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova wrote: > Hi all, > The cython tests started failing on one of my PRs which were succeeding > before. The error is one that I've never

cython test instability

2019-11-25 Thread Chad Dombrova
Hi all, The cython tests started failing on one of my PRs which were succeeding before. The error is one that I've never seen before (separated onto different lines to make it easier to read): Caused by: org.gradle.api.GradleException: Could not copy file

Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-25 Thread Tanay Tummalapalli
Congratulations! On Mon, Nov 25, 2019 at 11:12 PM Mark Liu wrote: > Congratulations, Daniel! > > On Mon, Nov 25, 2019 at 9:31 AM Ahmet Altay wrote: > >> Congratulations, Daniel! >> >> On Sat, Nov 23, 2019 at 3:47 AM jincheng sun >> wrote: >> >>> >>> Congrats, Daniel! >>> Best, >>> Jincheng

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-25 Thread Ahmet Altay
On Thu, Nov 21, 2019 at 2:49 PM Aizhamal Nurmamat kyzy wrote: > Maria put together this documents with related industry conferences [1], > it would make sense to choose a time that doesn't conflict with other > events around projects close to Beam. > > How about for June 21-22 (around Spark

Re: Cleaning up Approximate Algorithms in Beam

2019-11-25 Thread Reza Rokni
Hi, So do we need a vote for the final list of actions? Or is this thread enough to go ahead and raise the PR's? Cheers Reza On Tue, 26 Nov 2019 at 06:01, Ahmet Altay wrote: > > > On Mon, Nov 18, 2019 at 10:57 AM Robert Bradshaw > wrote: > >> On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni

Re: real real-time beam

2019-11-25 Thread Pablo Estrada
The blog posts on stateful and timely computation with Beam should help clarify a lot about how to use state and timers to do this: https://beam.apache.org/blog/2017/02/13/stateful-processing.html https://beam.apache.org/blog/2017/08/28/timely-processing.html You'll see there how there's an

Re: real real-time beam

2019-11-25 Thread Steve Niemitz
If you have a pipeline that looks like Input -> GroupByKey -> ParDo, while it is not guaranteed, in practice the sink will observe the trigger firings in order (per key), since it'll be fused to the output of the GBK operation (in all runners I know of). There have been a couple threads about

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-25 Thread Luke Cwik
Phase I sounds fine. Apache Beam follows semantic versioning and I believe removing the IOs will be a backwards incompatible change unless they were marked experimental which will be a problem for Phase 2. What is the feasibility of making the V1 transforms wrappers around V2? On Mon, Nov 25,

Re: real real-time beam

2019-11-25 Thread Aaron Dixon
@Jan @Pablo Thank you @Pablo In this case it's a single global windowed Combine/perKey, triggered per element. Keys are few (client accounts) so they can live forever. It looks like just by virtue of using a stateful ParDo I could get this final execution to be "serialized" per key. (Then I

Re: Full stream-stream join semantics

2019-11-25 Thread Kenneth Knowles
On Mon, Nov 25, 2019 at 1:56 PM Jan Lukavský wrote: > Hi Rui, > > > Hi Kenn, you think stateful DoFn based join can emit joined rows that > never to be retracted because in stateful DoFn case joined rows will be > controlled by timers and emit will be only once? If so I will agree with > it.

Re: real real-time beam

2019-11-25 Thread Jan Lukavský
One addition, to make the list of options exhaustive, there is probably one more option  c) create a ParDo keyed by primary key of your sink, cache the last write in there and compare it locally, without the need to query the database It would still need some timer to clear values after

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-25 Thread Mark Liu
[ ] Beaver [ ] Hedgehog [ ] Lemur [ ] Owl [ ] Salmon [ ] Trout [ ] Robot dinosaur [ ] Firefly [ ] Cuttlefish [X] Dumbo Octopus [ ] Angler fish On Mon, Nov 25, 2019 at 1:22 PM David Cavazos wrote: > Hi Kenneth, I tried adding back the email addresses, but they weren't > added on the existing

Re: Cleaning up Approximate Algorithms in Beam

2019-11-25 Thread Ahmet Altay
On Mon, Nov 18, 2019 at 10:57 AM Robert Bradshaw wrote: > On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni wrote: > >> *Ahmet: FWIW, There is a python implementation only for this >> version: >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/stats.py#L38 >>

Re: Full stream-stream join semantics

2019-11-25 Thread Jan Lukavský
Hi Rui, > Hi Kenn, you think stateful DoFn based join can emit joined rows that never to be retracted because in stateful DoFn case joined rows will be controlled by timers and emit will be only once? If so I will agree with it. Generally speaking, if only emit once is the factor of needing

Re: real real-time beam

2019-11-25 Thread Jan Lukavský
Hi Aaron, maybe someone else will give another option, but if I understand correctly what you want to solve, then you essentially have to do either:  a) use the compare & swap mechanism in the sink you described  b) use a buffer to buffer elements inside the outputting ParDo and only output

Re: real real-time beam

2019-11-25 Thread Pablo Estrada
If I understand correctly - your pipeline has some kind of windowing, and on every trigger downstream of the combiner, the pipeline updates a cache with a single, non-windowed value. Is that correct? What are your keys for this pipeline? You could work this out with, as you noted, a timer that

[DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-25 Thread Cam Mach
Hello Beam Devs, I have been working on the migration of Amazon Web Services IO connectors into the new AWS SDK for Java V2. The goal is to have an updated implementation aligned with the most recent AWS improvements. So far we have already migrated the connectors for AWS SNS, SQS and DynamoDB.

real real-time beam

2019-11-25 Thread Aaron Dixon
Suppose I trigger a Combine per-element (in a high-volume stream) and use a ParDo as a sink. I assume there is no guarantee about the order that my ParDo will see these triggers, especially as it processes in parallel, anyway. That said, my sink writes to a db or cache and I would not like the

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-25 Thread David Cavazos
Hi Kenneth, I tried adding back the email addresses, but they weren't added on the existing responses, it would only add them on new ones. :( I've already made it not accept new responses. There are only 8 responses (2 mine, 1 my real vote and 1 empty test vote), so hopefully everyone who voted

Re: Full stream-stream join semantics

2019-11-25 Thread Rui Wang
On Mon, Nov 25, 2019 at 11:29 AM Jan Lukavský wrote: > > On 11/25/19 7:47 PM, Kenneth Knowles wrote: > > > > On Sun, Nov 24, 2019 at 12:57 AM Jan Lukavský wrote: > >> I can put down a design document, but before that I need to clarify some >> things for me. I'm struggling to put all of this

Re: Failed retrieving service account

2019-11-25 Thread Yifan Zou
Hi, I've looked into this issue and found that the default service account was removed during the weekend for some reason log viewer

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-25 Thread Mikhail Gryzykhin
[ ] Beaver [X] Hedgehog [] Lemur [X] Owl [ ] Salmon [ ] Trout [X] Robot dinosaur [ ] Firefly [ ] Cuttlefish [ ] Dumbo Octopus [ ] Angler fish [X] Honey Badger

Re: Full stream-stream join semantics

2019-11-25 Thread Jan Lukavský
On 11/25/19 7:47 PM, Kenneth Knowles wrote: On Sun, Nov 24, 2019 at 12:57 AM Jan Lukavský > wrote: I can put down a design document, but before that I need to clarify some things for me. I'm struggling to put all of this into a bigger picture. Sorry if

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-25 Thread Daniel Oliveira
I'm also a bit late to the party. [ ] Beaver [ ] Hedgehog [X] Lemur [X] Owl [ ] Salmon [ ] Trout [ ] Robot dinosaur [X] Firefly [X] Cuttlefish [X] Dumbo Octopus [ ] Angler fish On Sun, Nov 24, 2019 at 8:37 AM Matthias Baetens wrote: > In case I'm not too late: > > [ ] Beaver > [ ] Hedgehog > [

Re: Failed retrieving service account

2019-11-25 Thread Tomo Suzuki
Thank you for looking into this. On Mon, Nov 25, 2019 at 12:59 PM Yifan Zou wrote: > Greetings, > > We're seeing some tests encountering permission issues such as *'Failed > to retrieve >

Re: Full stream-stream join semantics

2019-11-25 Thread Kenneth Knowles
On Sun, Nov 24, 2019 at 12:57 AM Jan Lukavský wrote: > I can put down a design document, but before that I need to clarify some > things for me. I'm struggling to put all of this into a bigger picture. > Sorry if the arguments are circulating, but I didn't notice any proposal of > how to solve

Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Kyle Weaver
Ah didn't see your pull request yet Thomas. Will take a look later. On Mon, Nov 25, 2019 at 10:23 AM Thomas Weise wrote: > Thanks, I would prefer to solve this in a way where the user does not need > to configure anything extra though. > > > On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver wrote:

Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Thomas Weise
Thanks, I would prefer to solve this in a way where the user does not need to configure anything extra though. On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver wrote: > When we added the class loader artifact stager, we introduced artifact > retrieval service type as a pipeline option. It would

Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Kyle Weaver
When we added the class loader artifact stager, we introduced artifact retrieval service type as a pipeline option. It would make sense to put a "none" option there.

Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Robert Bradshaw
boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as well. (Should this constant be put in a common location?) On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise wrote: > > JIRA: https://issues.apache.org/jira/browse/BEAM-8815 > > > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise wrote: >>

Failed retrieving service account

2019-11-25 Thread Yifan Zou
Greetings, We're seeing some tests encountering permission issues such as *'Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token

Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-25 Thread Mark Liu
Congratulations, Daniel! On Mon, Nov 25, 2019 at 9:31 AM Ahmet Altay wrote: > Congratulations, Daniel! > > On Sat, Nov 23, 2019 at 3:47 AM jincheng sun > wrote: > >> >> Congrats, Daniel! >> Best, >> Jincheng >> >> Alexey Romanenko 于2019年11月22日周五 下午5:47写道: >> >>> Congratulations, Daniel! >>>

Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-25 Thread Ahmet Altay
Congratulations, Daniel! On Sat, Nov 23, 2019 at 3:47 AM jincheng sun wrote: > > Congrats, Daniel! > Best, > Jincheng > > Alexey Romanenko 于2019年11月22日周五 下午5:47写道: > >> Congratulations, Daniel! >> >> On 22 Nov 2019, at 09:18, Jan Lukavský wrote: >> >> Congrats Daniel! >> On 11/21/19 10:11 AM,

Re: Triggers still finish and drop all data

2019-11-25 Thread Kenneth Knowles
And another: https://stackoverflow.com/questions/55748746/issues-with-dynamic-destinations-in-dataflow On Thu, Nov 14, 2019 at 1:35 AM Kenneth Knowles wrote: > > > On Fri, Nov 8, 2019 at 9:44 AM Steve Niemitz wrote: > >> Yeah that looks like what I had in mind too. I think the most useful >>

Re: Beam Dependency Check Report (2019-11-25)

2019-11-25 Thread Pablo Estrada
+Yifan Zou : ) On Mon, Nov 25, 2019 at 5:35 AM Tomo Suzuki wrote: > Can anybody take action on this error? > > > The service account was not found. The instance must be restarted via > the Compute Engine API to restore service account access. > > Regards, > Tomo > > On Mon, Nov 25, 2019 at

Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-25 Thread Ahmet Altay
Congratulations, Brian! On Tue, Nov 19, 2019 at 11:04 PM Tanay Tummalapalli wrote: > Congratulations! > > On Wed, Nov 20, 2019 at 6:15 AM Aizhamal Nurmamat kyzy < > aizha...@apache.org> wrote: > >> Congratulations, Brian! >> >> On Mon, Nov 18, 2019 at 10:29 AM Łukasz Gajowy >> wrote: >> >>>

Re: Beam Dependency Check Report (2019-11-25)

2019-11-25 Thread Tomo Suzuki
Can anybody take action on this error? > The service account was not found. The instance must be restarted via the Compute Engine API to restore service account access. Regards, Tomo On Mon, Nov 25, 2019 at 07:04 Apache Jenkins Server < jenk...@builds.apache.org> wrote: > ('Failed to retrieve

Beam Dependency Check Report (2019-11-25)

2019-11-25 Thread Apache Jenkins Server
('Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token from the Google Compute Enginemetadata service. Status: 404 Response:\nb\'"The service account was not found. The instance must be restarted