[PROPOSAL] Preparing for Beam 2.20.0 release

2020-02-11 Thread Rui Wang
Hi all,

The next (2.20.0) release branch cut is scheduled for 02/26, according to
the calendar

.
I would like to volunteer myself to do this release.
The plan is to cut the branch on that date, and cherrypick release-blocking
fixes afterwards if any.

Any unresolved release blocking JIRA issues for 2.20.0 should have their
"Fix Version/s" marked as "2.20.0".

Any comments or objections?


-Rui


Re: FnAPI proto backwards compatibility

2020-02-11 Thread Kenneth Knowles
On Tue, Feb 11, 2020 at 8:38 AM Robert Bradshaw  wrote:

> On Mon, Feb 10, 2020 at 7:35 PM Kenneth Knowles  wrote:
> >
> > On the runner requirements side: if you have such a list at the pipeline
> level, it is an opportunity for the list to be inconsistent with the
> contents of the pipeline. For example, if a DoFn is marked "requires stable
> input" but not listed at the pipeline level, then the runner may run it
> without ensuring it requires stable input.
>
> Yes. Listing this feature at the top level, if used, would be part of
> the contract. The problem here that we're trying to solve is that the
> runner wouldn't know about the field used to mark a DoFn as "requires
> stable input." Another alternative would be to make this kind of ParDo
> a different URN, but that would result in a cross product of URNs for
> all supported features.



> Rather than attaching it to the pipeline object, we could attach it to
> the transform. (But if there are ever extensions that don't belong to
> transforms, we'd be out of luck. It'd be even worse to attach it to
> the ParDoPayload, as then we'd need one on CombinePayload, etc. just
> in case.) This is why I was leaning towards just putting it at the
> top.
>
> I agree about the potential for incompatibility. As much as possible
> I'd rather extend things in a way that would be intrinsically rejected
> by a non-comprehending runner. But I'm not sure how to do that when
> introducing new constraints for existing components like this. But I'm
> open to other suggestions.
>

I was waiting for Luke to mention something he suggested offline: that we
make this set of fields a list of URNs and require a runner to fail if
there are any that it does not understand. That should do it for
DoFn-granularity features. It makes sense - proto is designed to
ignore/propagate unknown bits. We want to fail on unknown bits.

I do think that splittable ParDo and stateful ParDo should have separate
PTransform URNs since they are different paradigms than "vanilla" ParDo.

> On the SDK requirements side: the constructing SDK owns the Environment
> proto completely, so it is in a position to ensure the involved docker
> images support the necessary features.
>
> Yes.
>
> > Is it sufficient for each SDK involved in a cross-language expansion to
> validate that it understands the inputs? For example if Python sends a
> PCollection with a pickle coder to Java as input to an expansion then it
> will fail. And conversely if the returned subgraph outputs a PCollection
> with a Java custom coder.
>
> Yes. It's possible to imagine there could be some negotiation about
> inserting length prefix coders (e.g. a Count transform could act on
> any opaque data as long as it can delimit it), but that's still TBD.
>
> > More complex use cases that I can imagine all seem futuristic and
> unlikely to come to pass (Python passes a pickled DoFn to the Java
> expansion service which inserts it into the graph in a way where a
> Java-based transform would have to invoke it on every element, etc)
>
> Some transforms are configured with UDFs of this form...but we'll
> cross that bridge when we get to it.
>

Now that I think harder, I know of a TimestampFn that governs the
watermark. Does SDF solve this by allowing a composite IO where the parsing
to be done in one language while the watermark is somehow governed by the
other? And then there's writing a SQL UDF in your language of choice...
Anyhow, probably a tangent...

Kenn


> Kenn
> >
> > On Mon, Feb 10, 2020 at 5:03 PM Brian Hulette 
> wrote:
> >>
> >> I like the capabilities/requirements idea. Would these capabilities be
> at a level that it would make sense to document in the capabilities matrix?
> i.e. could the URNs be the values of "X" Pablo described here [1].
> >>
> >> Brian
> >>
> >> [1]
> https://lists.apache.org/thread.html/e93ac64d484551d61e559e1ba0cf4a15b760e69d74c5b1d0549ff74f%40%3Cdev.beam.apache.org%3E
> >>
> >> On Mon, Feb 10, 2020 at 3:55 PM Robert Bradshaw 
> wrote:
> >>>
> >>> With an eye towards cross-language (which includes cross-version)
> >>> pipelines and services (specifically looking at Dataflow) supporting
> >>> portable pipelines, there's been a desire to stabilize the portability
> >>> protos. There are currently many cleanups we'd like to do [1] (some
> >>> essential, others nice to have); are there others that people would
> >>> like to see?
> >>>
> >>> Of course we would like it to be possible for the FnAPI and Beam
> >>> itself to continue to evolve. Most of this can be handled by runners
> >>> understanding various transform URNs, but not all. (An example that
> >>> comes to mind is support for large iterables [2], or the requirement
> >>> to observe and respect new fields on a PTransform or its payloads
> >>> [3]). One proposal for this is to add capabilities and/or
> >>> requirements. An environment (corresponding generally to an SDK) could
> >>> adveritize various capabilities (as a list or map of URNs) which a
> >>> 

Re: Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-11 Thread jincheng sun
I think it's a good advice to remove the "-j 8" option if it doesn't affect
the performance much.


Udi Meiri  于2020年2月12日周三 上午2:20写道:

> For me the difference was about 20s longer (40s -> 60s approx). Not
> significant IMO
>
> On Tue, Feb 11, 2020 at 9:59 AM Ahmet Altay  wrote:
>
>> Should we remove the "-j 8" option by default? Sphinx docs says this is
>> an experimental option [1]. I do not recall docs generation taking a long
>> time, does this increase significantly without this option?
>>
>> [1] http://www.sphinx-doc.org/en/stable/man/sphinx-build.html
>>
>> On Tue, Feb 11, 2020 at 1:16 AM Shoaib Zafar <
>> shoaib.za...@venturedive.com> wrote:
>>
>>> Thanks, Udi and Jincheng for the response.
>>> The suggested solution worked for me as well.
>>>
>>> Regards,
>>>
>>> *Shoaib Zafar*
>>> Software Engineering Lead
>>> Mobile: +92 333 274 6242
>>> Skype: live:shoaibzafar_1
>>>
>>> 
>>>
>>>
>>> On Tue, Feb 11, 2020 at 1:17 PM jincheng sun 
>>> wrote:
>>>
 I have verified that this issue could be reproduced in my local
 environment (MacOS) and the solution suggested by Udi could work!

 Best,
 Jincheng

 Udi Meiri  于2020年2月11日周二 上午8:51写道:

> I don't have those issues (running on Linux), but a possible
> workaround could be to remove the "-j 8" flags (2 locations) in
> generate_pydoc.sh.
>
>
> On Mon, Feb 10, 2020 at 11:06 AM Shoaib Zafar <
> shoaib.za...@venturedive.com> wrote:
>
>> Hello Beamers.
>>
>> Just curious does anyone having trouble running
>> ':sdks:python:test-suites:tox:pycommon:docs' command locally?
>>
>> After rebasing with master recently, I am facing sphinx thread fork
>> error with on my macos mojave, using python 3.7.0.
>> I Tried to add system variable "export
>> OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google)
>> but no luck!
>>
>> Any suggestions/help?
>>
>> Thanks!
>>
>> Console Log:
>> --
>> 
>> Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
>> Creating file target/docs/source/apache_beam.utils.retry.rst.
>> Creating file
>> target/docs/source/apache_beam.utils.subprocess_server.rst.
>> Creating file
>> target/docs/source/apache_beam.utils.thread_pool_executor.rst.
>> Creating file target/docs/source/apache_beam.utils.timestamp.rst.
>> Creating file target/docs/source/apache_beam.utils.urns.rst.
>> Creating file target/docs/source/apache_beam.utils.rst.
>> objc[8384]: +[__NSCFConstantString initialize] may have been in
>> progress in another thread when fork() was called.
>> objc[8384]: +[__NSCFConstantString initialize] may have been in
>> progress in another thread when fork() was called. We cannot safely call 
>> it
>> or ignore it in the fork() child process. Crashing instead. Set a
>> breakpoint on objc_initializeAfterForkError to debug.
>>
>> Traceback (most recent call last):
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
>> line 304, in build_main
>> app.build(args.force_all, filenames)
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
>> line 335, in build
>> self.builder.build_all()
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 305, in build_all
>> self.build(None, summary=__('all source files'), method='all')
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 360, in build
>> updated_docnames = set(self.read())
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 466, in read
>> self._read_parallel(docnames, nproc=self.app.parallel)
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 521, in _read_parallel
>> tasks.join()
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
>> 

Re: Cross-language pipelines status

2020-02-11 Thread Chamikara Jayalath
On Tue, Feb 11, 2020 at 11:13 AM Heejong Lee  wrote:

>
>
> On Tue, Feb 11, 2020 at 9:37 AM Alexey Romanenko 
> wrote:
>
>> Hi all,
>>
>> I just wanted to ask for more details about the status of cross-language
>> pipelines (rather, transforms). I see some discussions about that here, but
>> I think it’s more around cross-language IOs.
>>
>> I’ll appreciate for any information about that topic and answers for
>> these questions:
>> - Are there any examples/guides of setting up and running cross-languages
>> pipelines?
>>
>
> AFAIK, there's no official guide for cross-language pipelines. But there
> are examples and test cases you can use as reference such as:
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang.py
>
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java
>
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service_test.py
>

I'm trying to work with tech writers to add more documentation related to
cross-language (in a few months). But any help related to documenting what
we have now is greatly appreciated.

>
>
>
>
>> - Is this something that already can be used (currently interested in
>> Java/Python pipelines) or the main work is still in progress? More
>> precisely - I’m more focused in executing some Python code from Java-based
>> pipelines.
>>
>
> The runner and SDK supports are in working state I could say but not many
> IOs expose their cross-language interface yet (you can easily write
> cross-language configuration for any Python transforms by yourself though).
>

Should mention here the test suites for portable Flink and Spark Heejong
added recently :)

https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_XVR_Flink/
https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_XVR_Spark/


>
>
>> - Is the information here
>> https://beam.apache.org/roadmap/connectors-multi-sdk/ up-to-date? Are
>> there any other entry points you can recommend?
>>
>
> I think it's up-to-date.
>

Mostly up to date.  Testing status is more complete now and we are actively
working on getting the dependences story correct and adding support for
DataflowRunner.


>
>
>>
>> Thanks!
>
>


Re: Cross-language pipelines status

2020-02-11 Thread Heejong Lee
On Tue, Feb 11, 2020 at 9:37 AM Alexey Romanenko 
wrote:

> Hi all,
>
> I just wanted to ask for more details about the status of cross-language
> pipelines (rather, transforms). I see some discussions about that here, but
> I think it’s more around cross-language IOs.
>
> I’ll appreciate for any information about that topic and answers for these
> questions:
> - Are there any examples/guides of setting up and running cross-languages
> pipelines?
>

AFAIK, there's no official guide for cross-language pipelines. But there
are examples and test cases you can use as reference such as:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang.py
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java
https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service_test.py



> - Is this something that already can be used (currently interested in
> Java/Python pipelines) or the main work is still in progress? More
> precisely - I’m more focused in executing some Python code from Java-based
> pipelines.
>

The runner and SDK supports are in working state I could say but not many
IOs expose their cross-language interface yet (you can easily write
cross-language configuration for any Python transforms by yourself though).


> - Is the information here
> https://beam.apache.org/roadmap/connectors-multi-sdk/ up-to-date? Are
> there any other entry points you can recommend?
>

I think it's up-to-date.


>
> Thanks!


Re: Upgrades gcsio to 2.0.0

2020-02-11 Thread Tomo Suzuki
> What prevents the usage of the newer version of Guava?

Cassandra is referencing an old field of Guava's CharMatcher. The
field "DIGIT" is no longer available after Guava 26.
https://github.com/apache/beam/pull/10769#issuecomment-583698718

On Mon, Feb 10, 2020 at 5:39 PM Luke Cwik  wrote:
>
> What prevents the usage of the newer version of Guava?
>
> On Mon, Feb 10, 2020 at 2:28 PM Esun Kim  wrote:
>>
>> Hi Beam Developers,
>>
>> I'm working on pr/10769 which upgrades gcsio from 1.9.16 to 2.0.0 which is 
>> an intermediate step to get us to use gcsio 2.x which supports gRPC, which 
>> potentially gives us better performance. (FYI, gcsio is a driver for Google 
>> Cloud Storage.)
>>
>> Link-check was run over this PR (result) and it appears that it has a couple 
>> of linker warnings from following modules because this uses a newer version 
>> of guava.
>>
>> com.google.cloud.hadoop.gcsio.cooplock.CoopLockRecordsDao (gcsio-2.0.0.jar)
>> com.google.cloud.hadoop.gcsio.cooplock.CoopLockOperationDao (gcsio-2.0.0.jar)
>> com.google.cloud.hadoop.gcsio.testing.InMemoryObjectEntry (gcsio-2.0.0.jar)
>>
>> But I believe that none of these is not actually problematic because 
>> cooplock is only for Hadoop (not for Beam) and testing is just testing. So I 
>> think it's okay to get this merged but I want to get an opinion on this from 
>> you.
>>
>> Regards,
>> Esun.
>>


-- 
Regards,
Tomo


Re: Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-11 Thread Udi Meiri
For me the difference was about 20s longer (40s -> 60s approx). Not
significant IMO

On Tue, Feb 11, 2020 at 9:59 AM Ahmet Altay  wrote:

> Should we remove the "-j 8" option by default? Sphinx docs says this is an
> experimental option [1]. I do not recall docs generation taking a long
> time, does this increase significantly without this option?
>
> [1] http://www.sphinx-doc.org/en/stable/man/sphinx-build.html
>
> On Tue, Feb 11, 2020 at 1:16 AM Shoaib Zafar 
> wrote:
>
>> Thanks, Udi and Jincheng for the response.
>> The suggested solution worked for me as well.
>>
>> Regards,
>>
>> *Shoaib Zafar*
>> Software Engineering Lead
>> Mobile: +92 333 274 6242
>> Skype: live:shoaibzafar_1
>>
>> 
>>
>>
>> On Tue, Feb 11, 2020 at 1:17 PM jincheng sun 
>> wrote:
>>
>>> I have verified that this issue could be reproduced in my local
>>> environment (MacOS) and the solution suggested by Udi could work!
>>>
>>> Best,
>>> Jincheng
>>>
>>> Udi Meiri  于2020年2月11日周二 上午8:51写道:
>>>
 I don't have those issues (running on Linux), but a possible workaround
 could be to remove the "-j 8" flags (2 locations) in generate_pydoc.sh.


 On Mon, Feb 10, 2020 at 11:06 AM Shoaib Zafar <
 shoaib.za...@venturedive.com> wrote:

> Hello Beamers.
>
> Just curious does anyone having trouble running
> ':sdks:python:test-suites:tox:pycommon:docs' command locally?
>
> After rebasing with master recently, I am facing sphinx thread fork
> error with on my macos mojave, using python 3.7.0.
> I Tried to add system variable "export
> OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google)
> but no luck!
>
> Any suggestions/help?
>
> Thanks!
>
> Console Log:
> --
> 
> Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
> Creating file target/docs/source/apache_beam.utils.retry.rst.
> Creating file
> target/docs/source/apache_beam.utils.subprocess_server.rst.
> Creating file
> target/docs/source/apache_beam.utils.thread_pool_executor.rst.
> Creating file target/docs/source/apache_beam.utils.timestamp.rst.
> Creating file target/docs/source/apache_beam.utils.urns.rst.
> Creating file target/docs/source/apache_beam.utils.rst.
> objc[8384]: +[__NSCFConstantString initialize] may have been in
> progress in another thread when fork() was called.
> objc[8384]: +[__NSCFConstantString initialize] may have been in
> progress in another thread when fork() was called. We cannot safely call 
> it
> or ignore it in the fork() child process. Crashing instead. Set a
> breakpoint on objc_initializeAfterForkError to debug.
>
> Traceback (most recent call last):
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
> line 304, in build_main
> app.build(args.force_all, filenames)
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
> line 335, in build
> self.builder.build_all()
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 305, in build_all
> self.build(None, summary=__('all source files'), method='all')
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 360, in build
> updated_docnames = set(self.read())
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 466, in read
> self._read_parallel(docnames, nproc=self.app.parallel)
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 521, in _read_parallel
> tasks.join()
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
> line 114, in join
> self._join_one()
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
> line 120, in _join_one
> 

Re: Unable to get java presubmit to run (not pass)

2020-02-11 Thread Yifan Zou
I tried 'Run Java PreCommit' in the PR and it created the job #1734 in
https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Phrase/.
The job is in the waiting queue right now. But, I am not sure why the
`Retest this please` failed executing all necessary tests.

On Tue, Feb 11, 2020 at 9:48 AM Daniel Collins  wrote:

> Hello beam developers,
>
> I've been trying to rerun presubmits on
> https://github.com/apache/beam/pull/10476 quite a few times, but it keeps
> stalling out at the "Java ("Run Java PreCommit") Pending — Build triggered
> for merge commit." Is there currently a problem with the jenkins cluster?
>
> Thanks!
>


Re: Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-11 Thread Ahmet Altay
Should we remove the "-j 8" option by default? Sphinx docs says this is an
experimental option [1]. I do not recall docs generation taking a long
time, does this increase significantly without this option?

[1] http://www.sphinx-doc.org/en/stable/man/sphinx-build.html

On Tue, Feb 11, 2020 at 1:16 AM Shoaib Zafar 
wrote:

> Thanks, Udi and Jincheng for the response.
> The suggested solution worked for me as well.
>
> Regards,
>
> *Shoaib Zafar*
> Software Engineering Lead
> Mobile: +92 333 274 6242
> Skype: live:shoaibzafar_1
>
> 
>
>
> On Tue, Feb 11, 2020 at 1:17 PM jincheng sun 
> wrote:
>
>> I have verified that this issue could be reproduced in my local
>> environment (MacOS) and the solution suggested by Udi could work!
>>
>> Best,
>> Jincheng
>>
>> Udi Meiri  于2020年2月11日周二 上午8:51写道:
>>
>>> I don't have those issues (running on Linux), but a possible workaround
>>> could be to remove the "-j 8" flags (2 locations) in generate_pydoc.sh.
>>>
>>>
>>> On Mon, Feb 10, 2020 at 11:06 AM Shoaib Zafar <
>>> shoaib.za...@venturedive.com> wrote:
>>>
 Hello Beamers.

 Just curious does anyone having trouble running
 ':sdks:python:test-suites:tox:pycommon:docs' command locally?

 After rebasing with master recently, I am facing sphinx thread fork
 error with on my macos mojave, using python 3.7.0.
 I Tried to add system variable "export
 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google) but
 no luck!

 Any suggestions/help?

 Thanks!

 Console Log:
 --
 
 Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
 Creating file target/docs/source/apache_beam.utils.retry.rst.
 Creating file
 target/docs/source/apache_beam.utils.subprocess_server.rst.
 Creating file
 target/docs/source/apache_beam.utils.thread_pool_executor.rst.
 Creating file target/docs/source/apache_beam.utils.timestamp.rst.
 Creating file target/docs/source/apache_beam.utils.urns.rst.
 Creating file target/docs/source/apache_beam.utils.rst.
 objc[8384]: +[__NSCFConstantString initialize] may have been in
 progress in another thread when fork() was called.
 objc[8384]: +[__NSCFConstantString initialize] may have been in
 progress in another thread when fork() was called. We cannot safely call it
 or ignore it in the fork() child process. Crashing instead. Set a
 breakpoint on objc_initializeAfterForkError to debug.

 Traceback (most recent call last):
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
 line 304, in build_main
 app.build(args.force_all, filenames)
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
 line 335, in build
 self.builder.build_all()
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
 line 305, in build_all
 self.build(None, summary=__('all source files'), method='all')
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
 line 360, in build
 updated_docnames = set(self.read())
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
 line 466, in read
 self._read_parallel(docnames, nproc=self.app.parallel)
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
 line 521, in _read_parallel
 tasks.join()
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
 line 114, in join
 self._join_one()
   File
 "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
 line 120, in _join_one
 exc, logs, result = pipe.recv()
   File
 "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
 line 250, in recv
 buf = self._recv_bytes()
   File
 

Unable to get java presubmit to run (not pass)

2020-02-11 Thread Daniel Collins
Hello beam developers,

I've been trying to rerun presubmits on
https://github.com/apache/beam/pull/10476 quite a few times, but it keeps
stalling out at the "Java ("Run Java PreCommit") Pending — Build triggered
for merge commit." Is there currently a problem with the jenkins cluster?

Thanks!


Cross-language pipelines status

2020-02-11 Thread Alexey Romanenko
Hi all,

I just wanted to ask for more details about the status of cross-language 
pipelines (rather, transforms). I see some discussions about that here, but I 
think it’s more around cross-language IOs.

I’ll appreciate for any information about that topic and answers for these 
questions:
- Are there any examples/guides of setting up and running cross-languages 
pipelines?
- Is this something that already can be used (currently interested in 
Java/Python pipelines) or the main work is still in progress? More precisely - 
I’m more focused in executing some Python code from Java-based pipelines.
- Is the information here https://beam.apache.org/roadmap/connectors-multi-sdk/ 
up-to-date? Are there any other entry points you can recommend?

Thanks!

Re: FnAPI proto backwards compatibility

2020-02-11 Thread Robert Bradshaw
On Mon, Feb 10, 2020 at 7:35 PM Kenneth Knowles  wrote:
>
> On the runner requirements side: if you have such a list at the pipeline 
> level, it is an opportunity for the list to be inconsistent with the contents 
> of the pipeline. For example, if a DoFn is marked "requires stable input" but 
> not listed at the pipeline level, then the runner may run it without ensuring 
> it requires stable input.

Yes. Listing this feature at the top level, if used, would be part of
the contract. The problem here that we're trying to solve is that the
runner wouldn't know about the field used to mark a DoFn as "requires
stable input." Another alternative would be to make this kind of ParDo
a different URN, but that would result in a cross product of URNs for
all supported features.

Rather than attaching it to the pipeline object, we could attach it to
the transform. (But if there are ever extensions that don't belong to
transforms, we'd be out of luck. It'd be even worse to attach it to
the ParDoPayload, as then we'd need one on CombinePayload, etc. just
in case.) This is why I was leaning towards just putting it at the
top.

I agree about the potential for incompatibility. As much as possible
I'd rather extend things in a way that would be intrinsically rejected
by a non-comprehending runner. But I'm not sure how to do that when
introducing new constraints for existing components like this. But I'm
open to other suggestions.

> On the SDK requirements side: the constructing SDK owns the Environment proto 
> completely, so it is in a position to ensure the involved docker images 
> support the necessary features.

Yes.

> Is it sufficient for each SDK involved in a cross-language expansion to 
> validate that it understands the inputs? For example if Python sends a 
> PCollection with a pickle coder to Java as input to an expansion then it will 
> fail. And conversely if the returned subgraph outputs a PCollection with a 
> Java custom coder.

Yes. It's possible to imagine there could be some negotiation about
inserting length prefix coders (e.g. a Count transform could act on
any opaque data as long as it can delimit it), but that's still TBD.

> More complex use cases that I can imagine all seem futuristic and unlikely to 
> come to pass (Python passes a pickled DoFn to the Java expansion service 
> which inserts it into the graph in a way where a Java-based transform would 
> have to invoke it on every element, etc)

Some transforms are configured with UDFs of this form...but we'll
cross that bridge when we get to it.

>
> Kenn
>
> On Mon, Feb 10, 2020 at 5:03 PM Brian Hulette  wrote:
>>
>> I like the capabilities/requirements idea. Would these capabilities be at a 
>> level that it would make sense to document in the capabilities matrix? i.e. 
>> could the URNs be the values of "X" Pablo described here [1].
>>
>> Brian
>>
>> [1] 
>> https://lists.apache.org/thread.html/e93ac64d484551d61e559e1ba0cf4a15b760e69d74c5b1d0549ff74f%40%3Cdev.beam.apache.org%3E
>>
>> On Mon, Feb 10, 2020 at 3:55 PM Robert Bradshaw  wrote:
>>>
>>> With an eye towards cross-language (which includes cross-version)
>>> pipelines and services (specifically looking at Dataflow) supporting
>>> portable pipelines, there's been a desire to stabilize the portability
>>> protos. There are currently many cleanups we'd like to do [1] (some
>>> essential, others nice to have); are there others that people would
>>> like to see?
>>>
>>> Of course we would like it to be possible for the FnAPI and Beam
>>> itself to continue to evolve. Most of this can be handled by runners
>>> understanding various transform URNs, but not all. (An example that
>>> comes to mind is support for large iterables [2], or the requirement
>>> to observe and respect new fields on a PTransform or its payloads
>>> [3]). One proposal for this is to add capabilities and/or
>>> requirements. An environment (corresponding generally to an SDK) could
>>> adveritize various capabilities (as a list or map of URNs) which a
>>> runner can take advantage of without requiring all SDKs to support all
>>> features at the same time. For the other way around, we need a way of
>>> marking something that a runner must reject if it does not understand
>>> it. This could be a set of requirements (again, a list of map of URNs)
>>> that designate capabilities required to at least be understood by the
>>> runner to faithfully execute this pipeline. (These could be attached
>>> to a transform or the pipeline itself.) Do these sound like reasonable
>>> additions? Also, would they ever need to be parameterized (map), or
>>> would a list suffice?
>>>
>>> [1] BEAM-2645, BEAM-2822, BEAM-3203, BEAM-3221, BEAM-3223, BEAM-3227,
>>> BEAM-3576, BEAM-3577, BEAM-3595, BEAM-4150, BEAM-4180, BEAM-4374,
>>> BEAM-5391, BEAM-5649, BEAM-8172, BEAM-8201, BEAM-8271, BEAM-8373,
>>> BEAM-8539, BEAM-8804, BEAM-9229, BEAM-9262, BEAM-9266, and BEAM-9272
>>> [2] 
>>> 

Re: Labels on PR

2020-02-11 Thread Robert Bradshaw
+1 to finding the right balance.

I do think per-runner makes sense, rather than a general "runners."
IOs might make sense as well. Not sure about all the extensions-* I'd
leave those out for now.

On Tue, Feb 11, 2020 at 5:56 AM Ismaël Mejía  wrote:
>
> > So I propose going simple with a limited set of labels. Later on we can 
> > refine. Don't forget that does labels only are useful during the life-cycle 
> > of a PR.
>
> Labels are handy for quick filtering and finding PRs we care about for example
> to review.
>
> I agree with the feeling that we should not go to the extremes, but what is
> requested in the PR rarely would produce more than 5 labels per PR.  For 
> example
> if a PR changes KafkaIO and something in the CI it will produce "java io kafka
> infra", a pure change on Flink runer will produce "runners flink"
>
> 100% d'accord with not to have many labels and keep them short, but the 
> current
> classification lacks detail, e.g. few people care about some general 
> categories
> "runners" or "io", but maintainers may care about their specific categories 
> like
> "spark" or "kafka" so I don't think that this extra level of detail is
> inappropriate and in the end it will only add one extra label per matching 
> path.
>
> Let's give it a try if it is too excesive we can took the opposite path and 
> reduce it.
>
> Ismaël
>
>
> On Tue, Feb 11, 2020 at 1:04 PM Alex Van Boxel  wrote:
>>
>> I'm wondering if we're not taking it too far with those detailed labels. 
>> It's like going from nothing to super details. The simples use-case hasn't 
>> proven itself in practice yet.
>>
>> So I propose going simple with a limited set of labels. Later on we can 
>> refine. Don't forget that does labels only are useful during the life-cycle 
>> of a PR.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Tue, Feb 11, 2020 at 9:46 AM Ismaël Mejía  wrote:
>>>
>>> Let some comments too, let's keep the discussion on refinements in the PR.
>>>
>>> On Tue, Feb 11, 2020 at 9:13 AM jincheng sun  
>>> wrote:

 I left comments on PR, the main suggestion is that we may need a 
 discussion about what kind of labels should be add. I would like to share 
 my thoughts as follows:

 I think we need to add labels according to some rules. For example, the 
 easiest way is to add labels by languages, java / python / go etc. But 
 this kind of help is very limited, so we need to subdivide some labels, 
 such as by components. Currently we have more than 70 components, each 
 component is configured with labels, and it seems cumbersome. So we should 
 have some rules for dividing labels, which can play the role of labels 
 without being too cumbersome. Such as:

 We can add `extensions` or `extensions-ideas and extensions-java` for the 
 following components:

 - extensions-ideas
 - extensions-java-join-library
 - extensions-java-json
 - extensions-java-protobuf
 - extensions-java-sketching
 - extensions-java-sorter

 And it's better to add a label for each Runner as follows:

 - runner-apex
 - runner-core
 - runner-dataflow
 - runner-direct
 - runner-flink
 - runner-jstorm
 - runner-...

 So, I think would be great to collect feedbacks from the community on the 
 set of labels needed.

 What do you think?

 Best,
 Jincheng

 Alex Van Boxel  于2020年2月11日周二 下午3:11写道:
>
> I've opened a PR and a ticket with INFRA.
>
> PR: https://github.com/apache/beam/pull/10824
>
>  _/
> _/ Alex Van Boxel
>
>
> On Tue, Feb 11, 2020 at 6:57 AM jincheng sun  
> wrote:
>>
>> +1. Autolabeler seems really cool and it seems that it's simple to 
>> configure and set up.
>>
>> Best,
>> Jincheng
>>
>>
>>
>> Udi Meiri  于2020年2月11日周二 上午2:01写道:
>>>
>>> Cool!
>>>
>>> On Mon, Feb 10, 2020 at 9:27 AM Robert Burke  wrote:

 +1 to autolabeling

 On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>
> Nice
>
> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel  
> wrote:
>>
>> Ha, cool. I'll have a look at the autolabeler. The infra stuff is 
>> not something I've looked at... I'll dive into that.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía  
>> wrote:
>>>
>>> +1
>>>
>>> You don't need to write your own action, there is already one 
>>> autolabeler action [1].
>>> INFRA can easily configure it for Beam (as they did for Avro [2]) 
>>> if we request it.
>>> The plugin is quite easy to configure and works like a charm [3].
>>>
>>> [1] https://github.com/probot/autolabeler
>>> [1] https://issues.apache.org/jira/browse/INFRA-17367
>>> [2] 

Re: Transitive dependency from external repository

2020-02-11 Thread Alexey Romanenko
Thanks all for suggestions! 
For now, I just added another option to “JavaNatureConfiguration” and some 
logic into “pom.withXml" function.

I created a PR about that:
https://github.com/apache/beam/pull/10832 


> On 7 Feb 2020, at 01:06, Luke Cwik  wrote:
> 
> It could do that as well.
> 
> On Thu, Feb 6, 2020 at 11:25 AM Kenneth Knowles  > wrote:
> That XML-generating code should be able to traverse project.repositories and 
> add them on a per-module basis, no?
> 
> On Thu, Feb 6, 2020 at 9:47 AM Luke Cwik  > wrote:
> We generate the pom using Gradle here[1].
> 
> The issue is that it applies to all beam modules and what you are asking for 
> isn't currently plumbed through. You could try adding an option to the 
> JavaNatureConfiguration[2] and then specify the additional repository in your 
> module.
> 
> 1: 
> https://github.com/apache/beam/blob/2473792306879e3fa6c5ab1f95523c7b44b4e288/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1142
>  
> 
> 2: 
> https://github.com/apache/beam/blob/2473792306879e3fa6c5ab1f95523c7b44b4e288/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L81
>  
> 
> On Thu, Feb 6, 2020 at 9:38 AM Jean-Baptiste Onofre  > wrote:
> Like this:
> repositories {
>   jcenter()
>   maven { url "https://plugins.gradle.org/m2/ 
> " }
>   maven {
> url "https://repo.spring.io/plugins-release/ 
> "
> content { includeGroup "io.spring.gradle" }
>   }
>   maven { url "foo" }
> }
> 
> 
>> Le 6 févr. 2020 à 18:37, Jean-Baptiste Onofre > > a écrit :
>> 
>> Great, thanks !
>> 
>> Back on your question, I guess we can add the repository in 
>> buildSrc/build.gradle (repositories property).
>> 
>> Regards
>> JB
>> 
>>> Le 6 févr. 2020 à 18:33, Alexey Romanenko >> > a écrit :
>>> 
>>> Yes, it's Apache License 2.0
>>> 
>>> https://packages.confluent.io/maven/io/confluent/kafka-avro-serializer/5.4.0/kafka-avro-serializer-5.4.0.pom
>>>  
>>> 
>>> 
 On 6 Feb 2020, at 18:12, Jean-Baptiste Onofre >>> > wrote:
 
 Hi,
 
 Just a side note: did you check the license of the dependency (just to be 
 sure it’s not a Cat X dependency) ?
 
 Regards
 JB
 
> Le 6 févr. 2020 à 18:06, Alexey Romanenko  > a écrit :
> 
> Hi,
> 
> To add support of Confluent Registry Schema in KafkaIO we added new 
> dependency on “io.confluent:kafka-avro-serializer”. The artifacts of this 
> dependency exist in external repository [1]. So, it should not be a 
> problem to add this repository into the list of available repositories of 
> Beam build system - it works fine to build Beam KafkaIO artifacts. 
> 
> The actual problem is with transitive dependency of 
> “io.confluent:kafka-avro-serializer” in user code. We add this dependency 
> into generated and then published KafkaIO's pom.xml but, to successfully 
> resolve it, we need to add a new repository [1] as well (or user should 
> add that manually in its pom which is definitevly not a perfect solution).
> 
> So, my questions to grade/build experts:
> 
> 1) How to add more repositories into published pom.xml with gradle, like 
> we do it in maven?
> 
> For example:
> 
> 
> confluent
> https://packages.confluent.io/maven/ 
> 
> 
> 
> 
> I tried several ways to do that, like adding "repositories { maven { url 
> "https://packages.confluent.io/maven 
> /“ } }” into KafkaIO build.gradle 
> but seems it doesn’t work (I don’t see any additional repositories in 
> published pom file). 
> 
> 2) Another option - would it better to vendor 
> “io.confluent:kafka-avro-serializer” along with KafkaIO and do not add an 
> addition dependency? Wdyt?
> 
> 3) Any other recommendations of better solution for such case?
> 
> Any help on this topic will be very appreciated.
> 
> Alexey
> 
> [1] https://packages.confluent.io/maven/ 
> 
>>> 
>> 
> 



Re: Labels on PR

2020-02-11 Thread Ismaël Mejía
> So I propose going simple with a limited set of labels. Later on we can
refine. Don't forget that does labels only are useful during the life-cycle
of a PR.

Labels are handy for quick filtering and finding PRs we care about for
example
to review.

I agree with the feeling that we should not go to the extremes, but what is
requested in the PR rarely would produce more than 5 labels per PR.  For
example
if a PR changes KafkaIO and something in the CI it will produce "java io
kafka
infra", a pure change on Flink runer will produce "runners flink"

100% d'accord with not to have many labels and keep them short, but the
current
classification lacks detail, e.g. few people care about some general
categories
"runners" or "io", but maintainers may care about their specific categories
like
"spark" or "kafka" so I don't think that this extra level of detail is
inappropriate and in the end it will only add one extra label per matching
path.

Let's give it a try if it is too excesive we can took the opposite path and
reduce it.

Ismaël


On Tue, Feb 11, 2020 at 1:04 PM Alex Van Boxel  wrote:

> I'm wondering if we're not taking it too far with those detailed labels.
> It's like going from nothing to super details. The simples use-case hasn't
> proven itself in practice yet.
>
> So I propose going simple with a limited set of labels. Later on we can
> refine. Don't forget that does labels only are useful during the life-cycle
> of a PR.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Tue, Feb 11, 2020 at 9:46 AM Ismaël Mejía  wrote:
>
>> Let some comments too, let's keep the discussion on refinements in the PR.
>>
>> On Tue, Feb 11, 2020 at 9:13 AM jincheng sun 
>> wrote:
>>
>>> I left comments on PR, the main suggestion is that we may need a
>>> discussion about what kind of labels should be add. I would like to share
>>> my thoughts as follows:
>>>
>>> I think we need to add labels according to some rules. For example, the
>>> easiest way is to add labels by languages, java / python / go etc. But this
>>> kind of help is very limited, so we need to subdivide some labels, such as
>>> by components. Currently we have more than 70 components, each component is
>>> configured with labels, and it seems cumbersome. So we should have some
>>> rules for dividing labels, which can play the role of labels without being
>>> too cumbersome. Such as:
>>>
>>> We can add `extensions` or `extensions-ideas and extensions-java` for
>>> the following components:
>>>
>>> - extensions-ideas
>>> - extensions-java-join-library
>>> - extensions-java-json
>>> - extensions-java-protobuf
>>> - extensions-java-sketching
>>> - extensions-java-sorter
>>>
>>> And it's better to add a label for each Runner as follows:
>>>
>>> - runner-apex
>>> - runner-core
>>> - runner-dataflow
>>> - runner-direct
>>> - runner-flink
>>> - runner-jstorm
>>> - runner-...
>>>
>>> So, I think would be great to collect feedbacks from the community on
>>> the set of labels needed.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Jincheng
>>>
>>> Alex Van Boxel  于2020年2月11日周二 下午3:11写道:
>>>
 I've opened a PR and a ticket with INFRA.

 PR: https://github.com/apache/beam/pull/10824

  _/
 _/ Alex Van Boxel


 On Tue, Feb 11, 2020 at 6:57 AM jincheng sun 
 wrote:

> +1. Autolabeler seems really cool and it seems that it's simple to
> configure and set up.
>
> Best,
> Jincheng
>
>
>
> Udi Meiri  于2020年2月11日周二 上午2:01写道:
>
>> Cool!
>>
>> On Mon, Feb 10, 2020 at 9:27 AM Robert Burke 
>> wrote:
>>
>>> +1 to autolabeling
>>>
>>> On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>>>
 Nice

 On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel 
 wrote:

> Ha, cool. I'll have a look at the autolabeler. The infra stuff is
> not something I've looked at... I'll dive into that.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía 
> wrote:
>
>> +1
>>
>> You don't need to write your own action, there is already one
>> autolabeler action [1].
>> INFRA can easily configure it for Beam (as they did for Avro
>> [2]) if we request it.
>> The plugin is quite easy to configure and works like a charm [3].
>>
>> [1] https://github.com/probot/autolabeler
>> [1] https://issues.apache.org/jira/browse/INFRA-17367
>> [2]
>> https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>>
>>
>> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Great initiative, thanks Alex! I was thinking to add such labels
>>> into PR title but I believe that GitHub labels are better since it 
>>> can be
>>> used easily for filtering, for example.
>>>

Re: Labels on PR

2020-02-11 Thread Alex Van Boxel
I'm wondering if we're not taking it too far with those detailed labels.
It's like going from nothing to super details. The simples use-case hasn't
proven itself in practice yet.

So I propose going simple with a limited set of labels. Later on we can
refine. Don't forget that does labels only are useful during the life-cycle
of a PR.

 _/
_/ Alex Van Boxel


On Tue, Feb 11, 2020 at 9:46 AM Ismaël Mejía  wrote:

> Let some comments too, let's keep the discussion on refinements in the PR.
>
> On Tue, Feb 11, 2020 at 9:13 AM jincheng sun 
> wrote:
>
>> I left comments on PR, the main suggestion is that we may need a
>> discussion about what kind of labels should be add. I would like to share
>> my thoughts as follows:
>>
>> I think we need to add labels according to some rules. For example, the
>> easiest way is to add labels by languages, java / python / go etc. But this
>> kind of help is very limited, so we need to subdivide some labels, such as
>> by components. Currently we have more than 70 components, each component is
>> configured with labels, and it seems cumbersome. So we should have some
>> rules for dividing labels, which can play the role of labels without being
>> too cumbersome. Such as:
>>
>> We can add `extensions` or `extensions-ideas and extensions-java` for the
>> following components:
>>
>> - extensions-ideas
>> - extensions-java-join-library
>> - extensions-java-json
>> - extensions-java-protobuf
>> - extensions-java-sketching
>> - extensions-java-sorter
>>
>> And it's better to add a label for each Runner as follows:
>>
>> - runner-apex
>> - runner-core
>> - runner-dataflow
>> - runner-direct
>> - runner-flink
>> - runner-jstorm
>> - runner-...
>>
>> So, I think would be great to collect feedbacks from the community on the
>> set of labels needed.
>>
>> What do you think?
>>
>> Best,
>> Jincheng
>>
>> Alex Van Boxel  于2020年2月11日周二 下午3:11写道:
>>
>>> I've opened a PR and a ticket with INFRA.
>>>
>>> PR: https://github.com/apache/beam/pull/10824
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Tue, Feb 11, 2020 at 6:57 AM jincheng sun 
>>> wrote:
>>>
 +1. Autolabeler seems really cool and it seems that it's simple to
 configure and set up.

 Best,
 Jincheng



 Udi Meiri  于2020年2月11日周二 上午2:01写道:

> Cool!
>
> On Mon, Feb 10, 2020 at 9:27 AM Robert Burke 
> wrote:
>
>> +1 to autolabeling
>>
>> On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>>
>>> Nice
>>>
>>> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel 
>>> wrote:
>>>
 Ha, cool. I'll have a look at the autolabeler. The infra stuff is
 not something I've looked at... I'll dive into that.

  _/
 _/ Alex Van Boxel


 On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía 
 wrote:

> +1
>
> You don't need to write your own action, there is already one
> autolabeler action [1].
> INFRA can easily configure it for Beam (as they did for Avro
> [2]) if we request it.
> The plugin is quite easy to configure and works like a charm [3].
>
> [1] https://github.com/probot/autolabeler
> [1] https://issues.apache.org/jira/browse/INFRA-17367
> [2]
> https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>
>
> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Great initiative, thanks Alex! I was thinking to add such labels
>> into PR title but I believe that GitHub labels are better since it 
>> can be
>> used easily for filtering, for example.
>>
>> Maybe it could be useful to add more granulation for labels, like
>> “release”, “runners”, “website”, etc but I’m afraid to make the 
>> titles too
>> heavy because of this.
>>
>> > On 10 Feb 2020, at 08:35, Alex Van Boxel 
>> wrote:
>> >
>> > I've started putting labels on PR's. I've done the first page
>> for now (as I'm afraid putting them on older once could affect the 
>> stale
>> bot. I hope this is ok.
>> >
>> > For now I'm only focussing on language and I'm going to see if
>> I can write a GitLab action for it. I hope this is useful. Other 
>> kind of
>> suggestions for labels, that can be automated, are welcome.
>> >
>> > 
>> >  _/
>> > _/ Alex Van Boxel
>>
>>


Re: Jenkins Jobs Trigger Request

2020-02-11 Thread Shoaib Zafar
Thanks Ismaël.

*Shoaib Zafar*
Software Engineering Lead
Mobile: +92 333 274 6242
Skype: live:shoaibzafar_1




On Tue, Feb 11, 2020 at 3:25 PM Ismaël Mejía  wrote:

> done
>
> On Tue, Feb 11, 2020 at 10:58 AM Shoaib Zafar <
> shoaib.za...@venturedive.com> wrote:
>
>> Hi everyone,
>>
>> I appreciate if someone could trigger the jobs for this PR:
>> https://github.com/apache/beam/pull/10712
>>
>> Thanks.
>>
>> *Shoaib Zafar*
>> Software Engineering Lead
>> Mobile: +92 333 274 6242
>> Skype: live:shoaibzafar_1
>>
>> 
>>
>


Re: Jenkins Jobs Trigger Request

2020-02-11 Thread Ismaël Mejía
done

On Tue, Feb 11, 2020 at 10:58 AM Shoaib Zafar 
wrote:

> Hi everyone,
>
> I appreciate if someone could trigger the jobs for this PR:
> https://github.com/apache/beam/pull/10712
>
> Thanks.
>
> *Shoaib Zafar*
> Software Engineering Lead
> Mobile: +92 333 274 6242
> Skype: live:shoaibzafar_1
>
> 
>


Jenkins Jobs Trigger Request

2020-02-11 Thread Shoaib Zafar
Hi everyone,

I appreciate if someone could trigger the jobs for this PR:
https://github.com/apache/beam/pull/10712

Thanks.

*Shoaib Zafar*
Software Engineering Lead
Mobile: +92 333 274 6242
Skype: live:shoaibzafar_1




Re: Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-11 Thread Shoaib Zafar
Thanks, Udi and Jincheng for the response.
The suggested solution worked for me as well.

Regards,

*Shoaib Zafar*
Software Engineering Lead
Mobile: +92 333 274 6242
Skype: live:shoaibzafar_1




On Tue, Feb 11, 2020 at 1:17 PM jincheng sun 
wrote:

> I have verified that this issue could be reproduced in my local
> environment (MacOS) and the solution suggested by Udi could work!
>
> Best,
> Jincheng
>
> Udi Meiri  于2020年2月11日周二 上午8:51写道:
>
>> I don't have those issues (running on Linux), but a possible workaround
>> could be to remove the "-j 8" flags (2 locations) in generate_pydoc.sh.
>>
>>
>> On Mon, Feb 10, 2020 at 11:06 AM Shoaib Zafar <
>> shoaib.za...@venturedive.com> wrote:
>>
>>> Hello Beamers.
>>>
>>> Just curious does anyone having trouble running
>>> ':sdks:python:test-suites:tox:pycommon:docs' command locally?
>>>
>>> After rebasing with master recently, I am facing sphinx thread fork
>>> error with on my macos mojave, using python 3.7.0.
>>> I Tried to add system variable "export
>>> OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google) but
>>> no luck!
>>>
>>> Any suggestions/help?
>>>
>>> Thanks!
>>>
>>> Console Log:
>>> --
>>> 
>>> Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
>>> Creating file target/docs/source/apache_beam.utils.retry.rst.
>>> Creating file target/docs/source/apache_beam.utils.subprocess_server.rst.
>>> Creating file
>>> target/docs/source/apache_beam.utils.thread_pool_executor.rst.
>>> Creating file target/docs/source/apache_beam.utils.timestamp.rst.
>>> Creating file target/docs/source/apache_beam.utils.urns.rst.
>>> Creating file target/docs/source/apache_beam.utils.rst.
>>> objc[8384]: +[__NSCFConstantString initialize] may have been in progress
>>> in another thread when fork() was called.
>>> objc[8384]: +[__NSCFConstantString initialize] may have been in progress
>>> in another thread when fork() was called. We cannot safely call it or
>>> ignore it in the fork() child process. Crashing instead. Set a breakpoint
>>> on objc_initializeAfterForkError to debug.
>>>
>>> Traceback (most recent call last):
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
>>> line 304, in build_main
>>> app.build(args.force_all, filenames)
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
>>> line 335, in build
>>> self.builder.build_all()
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>>> line 305, in build_all
>>> self.build(None, summary=__('all source files'), method='all')
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>>> line 360, in build
>>> updated_docnames = set(self.read())
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>>> line 466, in read
>>> self._read_parallel(docnames, nproc=self.app.parallel)
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>>> line 521, in _read_parallel
>>> tasks.join()
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
>>> line 114, in join
>>> self._join_one()
>>>   File
>>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
>>> line 120, in _join_one
>>> exc, logs, result = pipe.recv()
>>>   File
>>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>>> line 250, in recv
>>> buf = self._recv_bytes()
>>>   File
>>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>>> line 407, in _recv_bytes
>>> buf = self._recv(4)
>>>   File
>>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>>> line 383, in _recv
>>> raise EOFError
>>> EOFError
>>>
>>> Exception occurred:
>>>   File
>>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>>> line 383, in _recv
>>> raise EOFError
>>> EOFError
>>> The 

Re: Labels on PR

2020-02-11 Thread Ismaël Mejía
Let some comments too, let's keep the discussion on refinements in the PR.

On Tue, Feb 11, 2020 at 9:13 AM jincheng sun 
wrote:

> I left comments on PR, the main suggestion is that we may need a
> discussion about what kind of labels should be add. I would like to share
> my thoughts as follows:
>
> I think we need to add labels according to some rules. For example, the
> easiest way is to add labels by languages, java / python / go etc. But this
> kind of help is very limited, so we need to subdivide some labels, such as
> by components. Currently we have more than 70 components, each component is
> configured with labels, and it seems cumbersome. So we should have some
> rules for dividing labels, which can play the role of labels without being
> too cumbersome. Such as:
>
> We can add `extensions` or `extensions-ideas and extensions-java` for the
> following components:
>
> - extensions-ideas
> - extensions-java-join-library
> - extensions-java-json
> - extensions-java-protobuf
> - extensions-java-sketching
> - extensions-java-sorter
>
> And it's better to add a label for each Runner as follows:
>
> - runner-apex
> - runner-core
> - runner-dataflow
> - runner-direct
> - runner-flink
> - runner-jstorm
> - runner-...
>
> So, I think would be great to collect feedbacks from the community on the
> set of labels needed.
>
> What do you think?
>
> Best,
> Jincheng
>
> Alex Van Boxel  于2020年2月11日周二 下午3:11写道:
>
>> I've opened a PR and a ticket with INFRA.
>>
>> PR: https://github.com/apache/beam/pull/10824
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Tue, Feb 11, 2020 at 6:57 AM jincheng sun 
>> wrote:
>>
>>> +1. Autolabeler seems really cool and it seems that it's simple to
>>> configure and set up.
>>>
>>> Best,
>>> Jincheng
>>>
>>>
>>>
>>> Udi Meiri  于2020年2月11日周二 上午2:01写道:
>>>
 Cool!

 On Mon, Feb 10, 2020 at 9:27 AM Robert Burke 
 wrote:

> +1 to autolabeling
>
> On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>
>> Nice
>>
>> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel 
>> wrote:
>>
>>> Ha, cool. I'll have a look at the autolabeler. The infra stuff is
>>> not something I've looked at... I'll dive into that.
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía 
>>> wrote:
>>>
 +1

 You don't need to write your own action, there is already one
 autolabeler action [1].
 INFRA can easily configure it for Beam (as they did for Avro
 [2]) if we request it.
 The plugin is quite easy to configure and works like a charm [3].

 [1] https://github.com/probot/autolabeler
 [1] https://issues.apache.org/jira/browse/INFRA-17367
 [2]
 https://github.com/apache/avro/blob/master/.github/autolabeler.yml


 On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Great initiative, thanks Alex! I was thinking to add such labels
> into PR title but I believe that GitHub labels are better since it 
> can be
> used easily for filtering, for example.
>
> Maybe it could be useful to add more granulation for labels, like
> “release”, “runners”, “website”, etc but I’m afraid to make the 
> titles too
> heavy because of this.
>
> > On 10 Feb 2020, at 08:35, Alex Van Boxel 
> wrote:
> >
> > I've started putting labels on PR's. I've done the first page
> for now (as I'm afraid putting them on older once could affect the 
> stale
> bot. I hope this is ok.
> >
> > For now I'm only focussing on language and I'm going to see if I
> can write a GitLab action for it. I hope this is useful. Other kind of
> suggestions for labels, that can be automated, are welcome.
> >
> > 
> >  _/
> > _/ Alex Van Boxel
>
>


Re: Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-11 Thread jincheng sun
I have verified that this issue could be reproduced in my local environment
(MacOS) and the solution suggested by Udi could work!

Best,
Jincheng

Udi Meiri  于2020年2月11日周二 上午8:51写道:

> I don't have those issues (running on Linux), but a possible workaround
> could be to remove the "-j 8" flags (2 locations) in generate_pydoc.sh.
>
>
> On Mon, Feb 10, 2020 at 11:06 AM Shoaib Zafar <
> shoaib.za...@venturedive.com> wrote:
>
>> Hello Beamers.
>>
>> Just curious does anyone having trouble running
>> ':sdks:python:test-suites:tox:pycommon:docs' command locally?
>>
>> After rebasing with master recently, I am facing sphinx thread fork error
>> with on my macos mojave, using python 3.7.0.
>> I Tried to add system variable "export
>> OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google) but
>> no luck!
>>
>> Any suggestions/help?
>>
>> Thanks!
>>
>> Console Log:
>> --
>> 
>> Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
>> Creating file target/docs/source/apache_beam.utils.retry.rst.
>> Creating file target/docs/source/apache_beam.utils.subprocess_server.rst.
>> Creating file
>> target/docs/source/apache_beam.utils.thread_pool_executor.rst.
>> Creating file target/docs/source/apache_beam.utils.timestamp.rst.
>> Creating file target/docs/source/apache_beam.utils.urns.rst.
>> Creating file target/docs/source/apache_beam.utils.rst.
>> objc[8384]: +[__NSCFConstantString initialize] may have been in progress
>> in another thread when fork() was called.
>> objc[8384]: +[__NSCFConstantString initialize] may have been in progress
>> in another thread when fork() was called. We cannot safely call it or
>> ignore it in the fork() child process. Crashing instead. Set a breakpoint
>> on objc_initializeAfterForkError to debug.
>>
>> Traceback (most recent call last):
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
>> line 304, in build_main
>> app.build(args.force_all, filenames)
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
>> line 335, in build
>> self.builder.build_all()
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 305, in build_all
>> self.build(None, summary=__('all source files'), method='all')
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 360, in build
>> updated_docnames = set(self.read())
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 466, in read
>> self._read_parallel(docnames, nproc=self.app.parallel)
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
>> line 521, in _read_parallel
>> tasks.join()
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
>> line 114, in join
>> self._join_one()
>>   File
>> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
>> line 120, in _join_one
>> exc, logs, result = pipe.recv()
>>   File
>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>> line 250, in recv
>> buf = self._recv_bytes()
>>   File
>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>> line 407, in _recv_bytes
>> buf = self._recv(4)
>>   File
>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>> line 383, in _recv
>> raise EOFError
>> EOFError
>>
>> Exception occurred:
>>   File
>> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
>> line 383, in _recv
>> raise EOFError
>> EOFError
>> The full traceback has been saved in
>> /Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/tmp/sphinx-err-mphtfnei.log,
>> if you want to report the issue to the developers.
>> Please also report this if it was a user error, so that a better error
>> message can be provided next time.
>> A bug report can be 

Re: Labels on PR

2020-02-11 Thread jincheng sun
I left comments on PR, the main suggestion is that we may need a discussion
about what kind of labels should be add. I would like to share my thoughts
as follows:

I think we need to add labels according to some rules. For example, the
easiest way is to add labels by languages, java / python / go etc. But this
kind of help is very limited, so we need to subdivide some labels, such as
by components. Currently we have more than 70 components, each component is
configured with labels, and it seems cumbersome. So we should have some
rules for dividing labels, which can play the role of labels without being
too cumbersome. Such as:

We can add `extensions` or `extensions-ideas and extensions-java` for the
following components:

- extensions-ideas
- extensions-java-join-library
- extensions-java-json
- extensions-java-protobuf
- extensions-java-sketching
- extensions-java-sorter

And it's better to add a label for each Runner as follows:

- runner-apex
- runner-core
- runner-dataflow
- runner-direct
- runner-flink
- runner-jstorm
- runner-...

So, I think would be great to collect feedbacks from the community on the
set of labels needed.

What do you think?

Best,
Jincheng

Alex Van Boxel  于2020年2月11日周二 下午3:11写道:

> I've opened a PR and a ticket with INFRA.
>
> PR: https://github.com/apache/beam/pull/10824
>
>  _/
> _/ Alex Van Boxel
>
>
> On Tue, Feb 11, 2020 at 6:57 AM jincheng sun 
> wrote:
>
>> +1. Autolabeler seems really cool and it seems that it's simple to
>> configure and set up.
>>
>> Best,
>> Jincheng
>>
>>
>>
>> Udi Meiri  于2020年2月11日周二 上午2:01写道:
>>
>>> Cool!
>>>
>>> On Mon, Feb 10, 2020 at 9:27 AM Robert Burke  wrote:
>>>
 +1 to autolabeling

 On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:

> Nice
>
> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel 
> wrote:
>
>> Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
>> something I've looked at... I'll dive into that.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía 
>> wrote:
>>
>>> +1
>>>
>>> You don't need to write your own action, there is already one
>>> autolabeler action [1].
>>> INFRA can easily configure it for Beam (as they did for Avro
>>> [2]) if we request it.
>>> The plugin is quite easy to configure and works like a charm [3].
>>>
>>> [1] https://github.com/probot/autolabeler
>>> [1] https://issues.apache.org/jira/browse/INFRA-17367
>>> [2]
>>> https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>>>
>>>
>>> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Great initiative, thanks Alex! I was thinking to add such labels
 into PR title but I believe that GitHub labels are better since it can 
 be
 used easily for filtering, for example.

 Maybe it could be useful to add more granulation for labels, like
 “release”, “runners”, “website”, etc but I’m afraid to make the titles 
 too
 heavy because of this.

 > On 10 Feb 2020, at 08:35, Alex Van Boxel 
 wrote:
 >
 > I've started putting labels on PR's. I've done the first page for
 now (as I'm afraid putting them on older once could affect the stale 
 bot. I
 hope this is ok.
 >
 > For now I'm only focussing on language and I'm going to see if I
 can write a GitLab action for it. I hope this is useful. Other kind of
 suggestions for labels, that can be automated, are welcome.
 >
 > 
 >  _/
 > _/ Alex Van Boxel