Re: Key encodings for state requests

2019-11-12 Thread jincheng sun
Thanks for sharing your thoughts which give me more help to deep understanding the design of FnAPI, and It make more sense to me. Great thanks Robert ! Best, Jincheng Robert Bradshaw 于2019年11月12日周二 上午2:10写道: > On Fri, Nov 8, 2019 at 10:04 PM jincheng sun > wrote: > > > > > Let us first

Re: 10,000 Pull Requests

2019-11-12 Thread jincheng sun
Congratulate Beam community, Very amazing numbers, very active community! Best, Jincheng Maximilian Michels 于2019年11月8日周五 上午1:39写道: > Yes! Keep the committer pipeline filled ;) > > Reviewing PRs probably remains one of the toughest problems in active > open-source projects. > > On 07.11.19

On processing event streams

2019-11-12 Thread Jan Lukavský
Hi, this is follow up of multiple threads covering the topic of how to (in a unified way) process event streams. Event streams can be characterized by a common property that ordering of events matter. The processing (usually) looks something like   unordered stream -> buffer (per key) ->

Re: Test Failure: GcpOptionsTest$CommonTests. testDefaultGcpTempLocationDoesNotExist

2019-11-12 Thread Kyle Weaver
Hi Tomo, thanks for reporting. This test passes on my machine and on Jenkins. I'm guessing this test is assuming something about the host's gcloud settings, and is overfitting as a result. Probably we should mock something so that the test doesn't actually need to call gcloud. I have created a

Re: contributor permission for Beam Jira tickets: suztomo

2019-11-12 Thread Pablo Estrada
Hi Tomo! I've added you as contributor. Welcome! Best -P. On Tue, Nov 12, 2019 at 11:51 AM Tomo Suzuki wrote: > Hi Beam Devs, > > This is Tomo from Google New York. I'd like to contribute to Beam Java > dependencies upgrade. Can someone add me as a contributor for Beam's JIRA > issue tracker? >

contributor permission for Beam Jira tickets: suztomo

2019-11-12 Thread Tomo Suzuki
Hi Beam Devs, This is Tomo from Google New York. I'd like to contribute to Beam Java dependencies upgrade. Can someone add me as a contributor for Beam's JIRA issue tracker? GitHub account: suztomo Apache JIRA username: suztomo -- Regards, Tomo

Re: [Discuss] Beam mascot

2019-11-12 Thread Aizhamal Nurmamat kyzy
52 and 37 for me. I don't know what 53 is, but I like it too. On Tue, Nov 12, 2019 at 9:19 AM Maximilian Michels wrote: > More logos :D > > (35) - (37), (51), (48), (53) go into the direction of cuttlefish. > > From the new ones I like (52) because of the eyes. (53) If we want to > move into

Re: [Discuss] Beam mascot

2019-11-12 Thread Robert Bradshaw
On Tue, Nov 12, 2019 at 1:29 PM Aizhamal Nurmamat kyzy wrote: > 52 and 37 for me. I don't know what 53 is, but I like it too. Same. What about 37 with the eyes from 52? > On Tue, Nov 12, 2019 at 9:19 AM Maximilian Michels wrote: >> >> More logos :D >> >> (35) - (37), (51), (48), (53) go into

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-12 Thread Alexey Romanenko
On 8 Nov 2019, at 11:32, Maximilian Michels wrote: > > The dates sounds good to me. I agree that the bay area has an advantage > because of its large tech community. On the other hand, it is a question of > how we run the event. For Berlin we managed to get about 200 attendees to > Berlin,

Re: Completeness of Beam Java Dependency Check Report

2019-11-12 Thread Yifan Zou
Thanks Tomo. I'll follow up in JIRA. On Tue, Nov 12, 2019 at 9:44 AM Tomo Suzuki wrote: > Yifan, > I created a ticket to track this finding: > https://issues.apache.org/jira/browse/BEAM-8621 . > > > On Mon, Nov 11, 2019 at 5:08 PM Tomo Suzuki wrote: > >> Kenn, >> >> Thank you for the analysis.

Re: Completeness of Beam Java Dependency Check Report

2019-11-12 Thread Tomo Suzuki
Yifan, I created a ticket to track this finding: https://issues.apache.org/jira/browse/BEAM-8621 . On Mon, Nov 11, 2019 at 5:08 PM Tomo Suzuki wrote: > Kenn, > > Thank you for the analysis. Although Guava was randomly picked up, it's > great learning for me to learn how you analyzed other

Re: Behavior of TimestampCombiner?

2019-11-12 Thread Ruoyun Huang
Thanks for confirming. Since it is unexpected behavior, I shall look into jira if it is already on radar, if not, will create one. On Mon, Nov 11, 2019 at 6:11 PM Robert Bradshaw wrote: > The END_OF_WINDOW is indeed 9.99 (or, in Java, 9.999000), but the > results for LATEST and EARLIEST

Test Failure: GcpOptionsTest$CommonTests. testDefaultGcpTempLocationDoesNotExist

2019-11-12 Thread Tomo Suzuki
Hi Beam developers, I'm trying to build Apache Beam from the source. But GcpOptionsTest fails (error below). Did anybody solve this problem already? I'm using master (c2e58c55) suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/java check ... FAILURE: Build failed with an exception. * What went

Re: Is there good way to make Python SDK docs draft accessible?

2019-11-12 Thread Yoshiki Obata
Sorry for late reply. I've checked release process and found following way would be good to make the docs/scripts ready to be reviewed and merged. 1. Create PR for apache/beam-site about docs generated by scripts (hereinafter called PR1) PR1 is intended only to review docs, so it must not to

Re: Date/Time Ranges & Protobuf

2019-11-12 Thread Robert Bradshaw
I agree about it being a tagged union in the model (together with actual_time(...) - epsilon). It's not just a performance hack though, it's also (as discussed elsewhere) a question of being able to find an embedding into existing datetime libraries. The real question here is whether we should

[CANCELLED] [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

2019-11-12 Thread Jan Lukavský
I'm cancelling this due to lack of activity. I will issue a follow-up thread to find solution. On 11/9/19 11:45 AM, Jan Lukavský wrote: Hi, I'll try to summarize the mailing list threads to clarify why I think this addition is needed (and actually necessary):  a) there are situations where

Re: [spark structured streaming runner] merge to master?

2019-11-12 Thread Kyle Weaver
+1 to "one uber jar to rule them all." As Ryan said, we snuck the portable runner into master at a much less usable state than the structured streaming runner, and it didn't seem to cause any issues (although in retrospect we probably should have tagged it as @Experimental, I wasn't aware that was

Re: Command for Beam worker on Spark cluster

2019-11-12 Thread Kyle Weaver
Not sure what's causing the error. We should be able to see output from the process if you set the logging level to DEBUG. > Some debugging to boot.go and running it manually shows it doesn't return from "artifact.Materialize" function. Running boot.go by itself won't work if there is no

Re: Completeness of Beam Java Dependency Check Report

2019-11-12 Thread Yifan Zou
The dependency management tool is back. See the latest report . On Tue, Nov 12, 2019 at 9:51 AM Yifan Zou wrote: > Thanks Tomo. I'll follow up in JIRA. > > On Tue,

Beam Dependency Check Report (2019-11-12)

2019-11-12 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue mock 2.0.0 3.0.5 2019-05-20

Re: Contributor permission for Beam Jira tickets

2019-11-12 Thread Kenneth Knowles
Done. Welcome! On Tue, Nov 12, 2019 at 3:40 PM amit kumar wrote: > Hi Beam Devs, > > I am Amit from Godaddy and I am looking to contribute to Beam. > Could you please add me as a contributor. My Id is - amitkumar27 > > Regards, > Amit > > On Wed, Nov 6, 2019 at 9:59 AM amit kumar wrote: > >>

[Portability] Turn off artifact staging?

2019-11-12 Thread Kyle Weaver
Hi Beamers, We can use artifact staging to make sure SDK workers have access to a pipeline's dependencies. However, artifact staging is not always necessary. For example, one can make sure that the environment contains all the dependencies ahead of time. However, regardless of whether or not

Re: Contributor permission for Beam Jira tickets

2019-11-12 Thread amit kumar
Hi Beam Devs, I am Amit from Godaddy and I am looking to contribute to Beam. Could you please add me as a contributor. My Id is - amitkumar27 Regards, Amit On Wed, Nov 6, 2019 at 9:59 AM amit kumar wrote: > Hi Beam Devs, > > I am Amit from Godaddy and I am looking to contribute to Beam. >

Re: Contributor permission for Beam Jira tickets

2019-11-12 Thread amit kumar
THanks! On Tue, Nov 12, 2019 at 3:49 PM Kenneth Knowles wrote: > Done. Welcome! > > On Tue, Nov 12, 2019 at 3:40 PM amit kumar wrote: > >> Hi Beam Devs, >> >> I am Amit from Godaddy and I am looking to contribute to Beam. >> Could you please add me as a contributor. My Id is - amitkumar27 >>

Re: Behavior of TimestampCombiner?

2019-11-12 Thread Ruoyun Huang
Reported a tracking JIRA: https://issues.apache.org/jira/browse/BEAM-8645 On Tue, Nov 12, 2019 at 9:48 AM Ruoyun Huang wrote: > Thanks for confirming. > > Since it is unexpected behavior, I shall look into jira if it is already > on radar, if not, will create one. > > On Mon, Nov 11, 2019 at

Make environment_id a top level attribute of PTransform

2019-11-12 Thread Chamikara Jayalath
This was discussed in a JIRA [1] but don't think this was mentioned in the dev list. Not having environment_id as a top level attribute of PTransform [2] makes it difficult to track the Environment [3] a given PTransform should be executed in. For example, in Dataflow, we have to fork code in

Re: contributor permission for Beam Jira tickets: suztomo

2019-11-12 Thread Tomo Suzuki
Thank you so much On Tue, Nov 12, 2019 at 4:28 PM Pablo Estrada wrote: > Hi Tomo! > I've added you as contributor. Welcome! > Best > -P. > > On Tue, Nov 12, 2019 at 11:51 AM Tomo Suzuki wrote: > >> Hi Beam Devs, >> >> This is Tomo from Google New York. I'd like to contribute to Beam Java >>

Re: Test Failure: GcpOptionsTest$CommonTests. testDefaultGcpTempLocationDoesNotExist

2019-11-12 Thread Tomo Suzuki
Kyle, Great. Thank you for quick response. On Tue, Nov 12, 2019 at 4:26 PM Kyle Weaver wrote: > Hi Tomo, thanks for reporting. > > This test passes on my machine and on Jenkins. I'm guessing this test is > assuming something about the host's gcloud settings, and is overfitting as > a result.

Cleaning up Approximate Algorithms in Beam

2019-11-12 Thread Reza Rokni
Hi everyone; TL/DR : Discussion on Beam's various Approximate Distinct Count algorithms. Today there are several options for Approximate Algorithms in Apache Beam 2.16 with HLLCount being the most recently added. Would like to canvas opinions here on the possibility of rationalizing these API's

Re: [Portability] Turn off artifact staging?

2019-11-12 Thread Robert Bradshaw
Certainly there's a lot to be re-thought in terms of artifact staging, especially when it comes to cross-langauge pipelines. I think it would makes sense to have a special retrieval token for the "empty" manifest, which would mean a staging directory would never have to be set up if no artifacts

[discuss] Using a logger hierarchy in Python

2019-11-12 Thread Pablo Estrada
Hi all, as of today, the Python SDK uses the root logger wherever we log. This means that it's impossible to have different logging levels depending on the section of the code that we want to debug most. I have been doing some work on the FnApiRunner, and adding logging for it. I would like to

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-12 Thread Udi Meiri
+1 for better organization. I would have gone to ApacheCon LV had I known there was going to be a Beam summit there. On Tue, Nov 12, 2019 at 9:31 AM Alexey Romanenko wrote: > On 8 Nov 2019, at 11:32, Maximilian Michels wrote: > > > > The dates sounds good to me. I agree that the bay area has

Re: Behavior of TimestampCombiner?

2019-11-12 Thread Robert Bradshaw
I bet, as with the previous one, this is due to over-eager combiner lifting. On Tue, Nov 12, 2019 at 4:17 PM Ruoyun Huang wrote: > > Reported a tracking JIRA: https://issues.apache.org/jira/browse/BEAM-8645 > > On Tue, Nov 12, 2019 at 9:48 AM Ruoyun Huang wrote: >> >> Thanks for confirming. >>

Re: [Portability] Turn off artifact staging?

2019-11-12 Thread Robert Bradshaw
FWIW, there are also discussions of adding a preparation phase for sdk harness (docker) images, such that artifacts could be staged (and installed, compiled etc.) ahead of time and shipped as part of the sdk image rather than via a side channel (and on every worker). Anyone not using these images

Re: On processing event streams

2019-11-12 Thread Robert Bradshaw
One concern with (1) is that it may not be cheap to do for all runners. There also seems to be the implication that in batch elements would be 100% in order but in streaming kind-of-in-order is OK, which would lead to pipelines being developed/tested against stronger guarantees than are generally