Re: Pipeline AttributeError on Python3

2019-11-06 Thread Rakesh Kumar
Thanks Valentyn, Aggregation_transform.py doesn't have any transformation method which extends beam.DoFn. We are using plain python method which we passed in beam.Map(). I am not sure how to get the dump of serialized_fn. Can you please let me the process? I also heard that some people ran into

Re: 10,000 Pull Requests

2019-11-06 Thread Pablo Estrada
iiipe : ) On Thu, Nov 7, 2019 at 12:59 AM Kenneth Knowles wrote: > Awesome! > > Number of days from PR #1 and PR #1000: 211 > Number of days from PR #9000 and PR #1: 71 > > Kenn > > On Wed, Nov 6, 2019 at 6:28 AM Łukasz Gajowy wrote: > >> Yay! Nice! :) >> >> śr., 6 lis 2019 o 14:38

Re: Cython unit test suites running without Cythonized sources

2019-11-06 Thread Chad Dombrova
Another potential solution would be to _not_ use the sdist task to build the tarball and let tox do it. Tox should install cython on supported platforms before running sdist itself (which it does by default unless you explicitly provide it with a tarball, which we are doing). This has the added

Cython unit test suites running without Cythonized sources

2019-11-06 Thread Udi Meiri
I opened this bug today after commenting on Chad's type hints PR. https://issues.apache.org/jira/browse/BEAM-8572?filter=-1 I am 95% sure that our Precommit tests are using tarballs that are built without Cython (including the

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-06 Thread Pablo Estrada
I've added some Geo information to the Beam GA report[1]. In my opinion, it makes sense to target cities with the largest existing Beam presence (along with, perhaps, places where people can travel easily). - London and Paris are the top cities in Europe for visits to the Beam site, so I feel

Getting contributor permission to JIRA

2019-11-06 Thread Wenjia Liu
Hi, This is Wendy from Google. I'm contributing to adding more tests for Beam Python. Could anyone add me as a contributor for JIRA? I'd like to assign this issue BEAM-8575 to myself. Thanks, Wendy

(Question) SQL integration tests for MongoDb

2019-11-06 Thread Kirill Kozlov
Hi everyone! I am trying to test MongoDb Sql Table, but not quite sure how to pass pipeline options with the hostName, port, and databaseName used by Jenkins. It looks like the integration test for MongoDbIO Connector obtain those values from the

Re: Is there good way to make Python SDK docs draft accessible?

2019-11-06 Thread Valentyn Tymofieiev
Hi Yoshiki, Were you able to find the information you need to regenerate the documentation? Thanks, Valentyn On Tue, Oct 29, 2019 at 8:01 AM Yoshiki Obata wrote: > Thank you for advising, Udi and Ahmet. > I'll take a look at the release process. > > 2019年10月29日(火) 3:47 Ahmet Altay : > > > >

Re: published containers overwrite locally built containers

2019-11-06 Thread Valentyn Tymofieiev
On Wed, Nov 6, 2019 at 3:28 PM Heejong Lee wrote: > I think that implicitly (and forcefully) pull the remote image is not good > even in case of a bug fix. The better approach would be releasing a > separate bug fix version. Implicitly pulling the updated version of the > same container looks

Re: Command for Beam worker on Spark cluster

2019-11-06 Thread Kyle Weaver
> Where can I extract these parameters from? These parameters should be passed automatically when the process is run (note the use of $* in the example script):

Re: Deprecate some or all of TestPipelineOptions?

2019-11-06 Thread Robert Bradshaw
+1 to all of these are probably obsolete at this point and would be nice to remove. On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles wrote: > > Good find. I think TestPipelineOptions is from very early days. It makes > sense to me that these are all obsolete. Some guesses, though I haven't dug

Re: Command for Beam worker on Spark cluster

2019-11-06 Thread Matthew K.
Thanks, still I need to pass parameters to the boot executable, such as, worker id, control endpoint, logging endpoint, etc.   Where can I extract these parameters from? (In apache_beam Python code, those can be extracted from StartWorker request parameters)   Also, how spark executor can

Re: published containers overwrite locally built containers

2019-11-06 Thread Heejong Lee
I think that implicitly (and forcefully) pull the remote image is not good even in case of a bug fix. The better approach would be releasing a separate bug fix version. Implicitly pulling the updated version of the same container looks weird to me since it feels like releasing the jar artifact

Re: Command for Beam worker on Spark cluster

2019-11-06 Thread Kyle Weaver
In Docker mode, most everything's taken care of for you, but in process mode you have to do a lot of setup yourself. The command you're looking for is `sdks/python/container/build/target/launcher/linux_amd64/boot`. You will be required to have both that executable (which you can build from source

Re: Deprecate some or all of TestPipelineOptions?

2019-11-06 Thread Kenneth Knowles
Good find. I think TestPipelineOptions is from very early days. It makes sense to me that these are all obsolete. Some guesses, though I haven't dug through commit history to confirm: - TempRoot: a while ago TempLocation was optional, so I think this would provide a default for things like

Re: How to use a locally built worker image?

2019-11-06 Thread Kyle Weaver
> These are -SNAPSHOT or .dev and distinct from releases. So perhaps this is just a matter of correct version number management? In light of the concerns Valentyn & Ahmet raise, it seems safer to change tags instead of removing the pull. PR for Java: https://github.com/apache/beam/pull/10017 On

Command for Beam worker on Spark cluster

2019-11-06 Thread Matthew K.
Hi all,   I am trying to run *Python* beam pipeline on a Spark cluster. Since workers are running on separate nodes, I am using "PROCESS" for "evironment_type" in pipeline options, but I couldn't find any documentation on what "command" I should pass to "environment_config" to run on the worker,

Re: How to use a locally built worker image?

2019-11-06 Thread Ahmet Altay
On Wed, Nov 6, 2019 at 1:34 PM Valentyn Tymofieiev wrote: > On Wed, Nov 6, 2019 at 11:48 AM Kyle Weaver wrote: > >> The way the Python SDK currently does this is to use the version as the >> default tag, eg 2.16.0. While master uses 2.16.0.dev. This means there >> should never be any conflicts

Re: How to use a locally built worker image?

2019-11-06 Thread Valentyn Tymofieiev
On Wed, Nov 6, 2019 at 11:48 AM Kyle Weaver wrote: > The way the Python SDK currently does this is to use the version as the > default tag, eg 2.16.0. While master uses 2.16.0.dev. This means there > should never be any conflicts between a release and developer image, unless > the user

Re: Contributing to Beam javadoc

2019-11-06 Thread Ismaël Mejía
Done, you can now self assign issues too, welcome Jonathan! On Wed, Nov 6, 2019 at 10:00 PM Jonathan Alvarez-Gutierrez wrote: > > Hey, > > I just filed https://issues.apache.org/jira/browse/BEAM-8573 and wanted to > create a PR with a fix. > > I should also check if there's an extant

Re: Contributor Permission for ThriftIO

2019-11-06 Thread Ismaël Mejía
Hello, Welcome, you have now contributor permissions and the ticket is assigned to you. Sounds like a nice addition to have. Best, Ismaël On Wed, Nov 6, 2019 at 6:07 PM Christopher Larsen wrote: > > Hello, > > I would like to contribute to the addition of ThriftIO. Would someone be able > to

Contributing to Beam javadoc

2019-11-06 Thread Jonathan Alvarez-Gutierrez
Hey, I just filed https://issues.apache.org/jira/browse/BEAM-8573 and wanted to create a PR with a fix. I should also check if there's an extant documentation / Splittable DoFn project that would pre-empt or subsume my teeny documentation fix. If not, I'd like to assign the issue to

Deprecate some or all of TestPipelineOptions?

2019-11-06 Thread Brian Hulette
I recently came across TestPipelineOptions, and now I'm wondering if maybe it should be deprecated. It only seems to actually be supported for Spark and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it may make more sense to move the functionality it provides into the tests

Re: How to use a locally built worker image?

2019-11-06 Thread Kyle Weaver
The way the Python SDK currently does this is to use the version as the default tag, eg 2.16.0. While master uses 2.16.0.dev. This means there should never be any conflicts between a release and developer image, unless the user deliberately changes the image tags. > if a users' pipeline is relies

Re: How to use a locally built worker image?

2019-11-06 Thread Thomas Weise
As developer in Beam, I expect something that I built locally to be used. That's the case with Java and Python dependencies also. These are -SNAPSHOT or .dev and distinct from releases. So perhaps this is just a matter of correct version number management? Users that consume our releases

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-06 Thread Robert Bradshaw
Yes, the portability framework is designed to support this, and possibly even more efficient transfers of data than element-by-element as per the wire coder specified in the IO port operators. I left some comments on the doc as well, and would also prefer approach 2. On Wed, Nov 6, 2019 at 11:03

Re: How to use a locally built worker image?

2019-11-06 Thread Valentyn Tymofieiev
> > Anyway, I agree with Thomas that implicitly running `docker pull` is > confusing and requires some adjustments to work around. The user can always > run `docker pull` themselves if that's the intention. I understand that implicit pull may come across as surprising. However I see the

[Discuss] Beam Summit 2020 Dates & locations

2019-11-06 Thread Griselda Cuevas
Hi Beam Community! I'd like to kick off a thread to discuss potential dates and venues for the 2020 Beam Summits. I did some research on industry conferences happening in 2020 and pre-selected a few ranges as follows: (2 days) NA between mid-May and mid-June (2 days) EU mid October (1 day) Asia

Re: [EXTERNAL] Re: FirestoreIO connector [JavaSDK]

2019-11-06 Thread Chamikara Jayalath
BTW, FYI, I'm also talking with folks from Google Firestore team regarding this. I think they had shown some interest in taking this up but I'm not sure. If they are able to contribute here, and if we can coordinate with some of those folks on this effort, it will be great for the long term health

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-06 Thread Kenneth Knowles
I think the portability framework is designed for this. The runner controls the coder on the grpc ports and the runner controls the process bundle descriptor. I commented on the doc. I think what is missing is analysis of scope of SDK harness changes and risk to model consistency Approach 2:

Contributor Permission for ThriftIO

2019-11-06 Thread Christopher Larsen
Hello, I would like to contribute to the addition of ThriftIO. Would someone be able to add me as a contributor? I have opened a ticket at BEAM-8561 and my Jira ID is: clarsen. Best, Chris Larsen -- _This message contains information that may

Re: Key encodings for state requests

2019-11-06 Thread Robert Bradshaw
On Wed, Nov 6, 2019 at 2:55 AM Maximilian Michels wrote: > > Let me try to clarify: > > > The Coder used for State/Timers in a StatefulDoFn is pulled out of the > > input PCollection. If a Runner needs to partition by this coder, it > > should ensure the coder of this PCollection matches with the

Re: published containers overwrite locally built containers

2019-11-06 Thread Valentyn Tymofieiev
I agree with the resolutions in the link Thomas mentioned [1]. Using latest tag is not reliable, and a unique tag ID should be generated when running tests on Jenkins against master branch. I think pulling the latest image for the current tag is actually a desired behavior, in case the external

Re: Jenkins workflow improvement question

2019-11-06 Thread Łukasz Gajowy
To me, any way of changing the seed job so that it does not create one global configuration of all jobs and creates it per branch basis would be a solid improvement to our CI so +1 if this is achievable without loosing currently used Jenkins' features. Łukasz śr., 6 lis 2019 o 01:12 Alan Myrvold

Re: 10,000 Pull Requests

2019-11-06 Thread Łukasz Gajowy
Yay! Nice! :) śr., 6 lis 2019 o 14:38 Maximilian Michels napisał(a): > Just wanted to point out, we have crossed the 10,000 PRs mark :) > > ...and the winner is: https://github.com/apache/beam/pull/1 > > Seriously, I think Beam's culture to promote PRs over direct access to > the repository

10,000 Pull Requests

2019-11-06 Thread Maximilian Michels
Just wanted to point out, we have crossed the 10,000 PRs mark :) ...and the winner is: https://github.com/apache/beam/pull/1 Seriously, I think Beam's culture to promote PRs over direct access to the repository is remarkable. To another 10,000 PRs! Cheers, Max

Re: Beam EventHubs Java connector

2019-11-06 Thread Ismaël Mejía
Hello, We are definitely interested in this contribution. The license of the Azure Event Hub library is MIT so there are not issues to include it in the IO connector. You just have to take a look at the way we write IOs on Beam to wrap it. Please create a JIRA and assign it to yourself, also try

Beam EventHubs Java connector

2019-11-06 Thread Jonathan Perron
Dear all, I will soon need to plug an Apache Beam pipeline on an Azure EventHubs service. I have not seen references of such connector yet, so I would like to know if my code would be of interest to add to a future Apache Beam release ? I have already seen that a Java library is available

Re: Key encodings for state requests

2019-11-06 Thread Maximilian Michels
Let me try to clarify: The Coder used for State/Timers in a StatefulDoFn is pulled out of the input PCollection. If a Runner needs to partition by this coder, it should ensure the coder of this PCollection matches with the Coder used to create the serialized bytes that are used for partitioning

RE: [EXTERNAL] Re: FirestoreIO connector [JavaSDK]

2019-11-06 Thread Stefan Djelekar
Thanks for the valuable information. We’ve been exactly looking up to that Datastore connector, but there are some differences of course. I’ll make the PR in the next few days and let’s pick it up from there. King regards, Stefan From: Chamikara Jayalath Sent: Tuesday, November 5, 2019 2:24

[DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-06 Thread jincheng sun
Hi all, I am trying to make some improvements of portability framework to make it usable in other projects. However, we find that the coder between runner and harness can only be FullWindowedValueCoder. This means each time when sending a WindowedValue, we have to encode/decode timestamp, windows