Re: Labels on PR

2020-02-10 Thread Alex Van Boxel
I've opened a PR and a ticket with INFRA.

PR: https://github.com/apache/beam/pull/10824

 _/
_/ Alex Van Boxel


On Tue, Feb 11, 2020 at 6:57 AM jincheng sun 
wrote:

> +1. Autolabeler seems really cool and it seems that it's simple to
> configure and set up.
>
> Best,
> Jincheng
>
>
>
> Udi Meiri  于2020年2月11日周二 上午2:01写道:
>
>> Cool!
>>
>> On Mon, Feb 10, 2020 at 9:27 AM Robert Burke  wrote:
>>
>>> +1 to autolabeling
>>>
>>> On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>>>
 Nice

 On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel 
 wrote:

> Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
> something I've looked at... I'll dive into that.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía 
> wrote:
>
>> +1
>>
>> You don't need to write your own action, there is already one
>> autolabeler action [1].
>> INFRA can easily configure it for Beam (as they did for Avro [2]) if
>> we request it.
>> The plugin is quite easy to configure and works like a charm [3].
>>
>> [1] https://github.com/probot/autolabeler
>> [1] https://issues.apache.org/jira/browse/INFRA-17367
>> [2]
>> https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>>
>>
>> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Great initiative, thanks Alex! I was thinking to add such labels
>>> into PR title but I believe that GitHub labels are better since it can 
>>> be
>>> used easily for filtering, for example.
>>>
>>> Maybe it could be useful to add more granulation for labels, like
>>> “release”, “runners”, “website”, etc but I’m afraid to make the titles 
>>> too
>>> heavy because of this.
>>>
>>> > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
>>> >
>>> > I've started putting labels on PR's. I've done the first page for
>>> now (as I'm afraid putting them on older once could affect the stale 
>>> bot. I
>>> hope this is ok.
>>> >
>>> > For now I'm only focussing on language and I'm going to see if I
>>> can write a GitLab action for it. I hope this is useful. Other kind of
>>> suggestions for labels, that can be automated, are welcome.
>>> >
>>> > 
>>> >  _/
>>> > _/ Alex Van Boxel
>>>
>>>


Re: contributor permission for Beam Jira tickets

2020-02-10 Thread Yixing Zhang
Thank you, Kenneth!

On Mon, Feb 10, 2020 at 7:24 PM Kenneth Knowles  wrote:

> Welcome! I have added you to the Contributors role so you can be assigned
> Jira tickets.
>
> On Mon, Feb 10, 2020 at 4:28 PM Yixing Zhang 
> wrote:
>
>> Hi,
>>
>> This is Yixing from LinkedIn. I'm closely working with Xinyu on Samza
>> runner. Can someone add me as a contributor for Beam's Jira issue tracker?
>> I would like to create/assign tickets for my work.
>>
>> jira username: YixingZhang
>>
>> Thanks,
>> Yixing Zhang
>>
>


Re: Labels on PR

2020-02-10 Thread jincheng sun
+1. Autolabeler seems really cool and it seems that it's simple to
configure and set up.

Best,
Jincheng



Udi Meiri  于2020年2月11日周二 上午2:01写道:

> Cool!
>
> On Mon, Feb 10, 2020 at 9:27 AM Robert Burke  wrote:
>
>> +1 to autolabeling
>>
>> On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>>
>>> Nice
>>>
>>> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel  wrote:
>>>
 Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
 something I've looked at... I'll dive into that.

  _/
 _/ Alex Van Boxel


 On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía 
 wrote:

> +1
>
> You don't need to write your own action, there is already one
> autolabeler action [1].
> INFRA can easily configure it for Beam (as they did for Avro [2]) if
> we request it.
> The plugin is quite easy to configure and works like a charm [3].
>
> [1] https://github.com/probot/autolabeler
> [1] https://issues.apache.org/jira/browse/INFRA-17367
> [2] https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>
>
> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Great initiative, thanks Alex! I was thinking to add such labels into
>> PR title but I believe that GitHub labels are better since it can be used
>> easily for filtering, for example.
>>
>> Maybe it could be useful to add more granulation for labels, like
>> “release”, “runners”, “website”, etc but I’m afraid to make the titles 
>> too
>> heavy because of this.
>>
>> > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
>> >
>> > I've started putting labels on PR's. I've done the first page for
>> now (as I'm afraid putting them on older once could affect the stale 
>> bot. I
>> hope this is ok.
>> >
>> > For now I'm only focussing on language and I'm going to see if I
>> can write a GitLab action for it. I hope this is useful. Other kind of
>> suggestions for labels, that can be automated, are welcome.
>> >
>> > 
>> >  _/
>> > _/ Alex Van Boxel
>>
>>


Re: FnAPI proto backwards compatibility

2020-02-10 Thread Kenneth Knowles
On the runner requirements side: if you have such a list at the pipeline
level, it is an opportunity for the list to be inconsistent with the
contents of the pipeline. For example, if a DoFn is marked "requires stable
input" but not listed at the pipeline level, then the runner may run it
without ensuring it requires stable input.

On the SDK requirements side: the constructing SDK owns the Environment
proto completely, so it is in a position to ensure the involved docker
images support the necessary features. Is it sufficient for each SDK
involved in a cross-language expansion to validate that it understands the
inputs? For example if Python sends a PCollection with a pickle coder to
Java as input to an expansion then it will fail. And conversely if the
returned subgraph outputs a PCollection with a Java custom coder. More
complex use cases that I can imagine all seem futuristic and unlikely to
come to pass (Python passes a pickled DoFn to the Java expansion service
which inserts it into the graph in a way where a Java-based transform would
have to invoke it on every element, etc)

Kenn

On Mon, Feb 10, 2020 at 5:03 PM Brian Hulette  wrote:

> I like the capabilities/requirements idea. Would these capabilities be at
> a level that it would make sense to document in the capabilities matrix?
> i.e. could the URNs be the values of "X" Pablo described here [1].
>
> Brian
>
> [1]
> https://lists.apache.org/thread.html/e93ac64d484551d61e559e1ba0cf4a15b760e69d74c5b1d0549ff74f%40%3Cdev.beam.apache.org%3E
>
> On Mon, Feb 10, 2020 at 3:55 PM Robert Bradshaw 
> wrote:
>
>> With an eye towards cross-language (which includes cross-version)
>> pipelines and services (specifically looking at Dataflow) supporting
>> portable pipelines, there's been a desire to stabilize the portability
>> protos. There are currently many cleanups we'd like to do [1] (some
>> essential, others nice to have); are there others that people would
>> like to see?
>>
>> Of course we would like it to be possible for the FnAPI and Beam
>> itself to continue to evolve. Most of this can be handled by runners
>> understanding various transform URNs, but not all. (An example that
>> comes to mind is support for large iterables [2], or the requirement
>> to observe and respect new fields on a PTransform or its payloads
>> [3]). One proposal for this is to add capabilities and/or
>> requirements. An environment (corresponding generally to an SDK) could
>> adveritize various capabilities (as a list or map of URNs) which a
>> runner can take advantage of without requiring all SDKs to support all
>> features at the same time. For the other way around, we need a way of
>> marking something that a runner must reject if it does not understand
>> it. This could be a set of requirements (again, a list of map of URNs)
>> that designate capabilities required to at least be understood by the
>> runner to faithfully execute this pipeline. (These could be attached
>> to a transform or the pipeline itself.) Do these sound like reasonable
>> additions? Also, would they ever need to be parameterized (map), or
>> would a list suffice?
>>
>> [1] BEAM-2645, BEAM-2822, BEAM-3203, BEAM-3221, BEAM-3223, BEAM-3227,
>> BEAM-3576, BEAM-3577, BEAM-3595, BEAM-4150, BEAM-4180, BEAM-4374,
>> BEAM-5391, BEAM-5649, BEAM-8172, BEAM-8201, BEAM-8271, BEAM-8373,
>> BEAM-8539, BEAM-8804, BEAM-9229, BEAM-9262, BEAM-9266, and BEAM-9272
>> [2]
>> https://lists.apache.org/thread.html/70cac361b659516933c505b513d43986c25c13da59eabfd28457f1f2@%3Cdev.beam.apache.org%3E
>> [3]
>> https://lists.apache.org/thread.html/rdc57f240069c0807eae87ed2ff13d3ee503bc18e5f906d05624e6433%40%3Cdev.beam.apache.org%3E
>>
>


Re: contributor permission for Beam Jira tickets

2020-02-10 Thread Kenneth Knowles
Welcome! I have added you to the Contributors role so you can be assigned
Jira tickets.

On Mon, Feb 10, 2020 at 4:28 PM Yixing Zhang 
wrote:

> Hi,
>
> This is Yixing from LinkedIn. I'm closely working with Xinyu on Samza
> runner. Can someone add me as a contributor for Beam's Jira issue tracker?
> I would like to create/assign tickets for my work.
>
> jira username: YixingZhang
>
> Thanks,
> Yixing Zhang
>


Re: FnAPI proto backwards compatibility

2020-02-10 Thread Brian Hulette
I like the capabilities/requirements idea. Would these capabilities be at a
level that it would make sense to document in the capabilities matrix? i.e.
could the URNs be the values of "X" Pablo described here [1].

Brian

[1]
https://lists.apache.org/thread.html/e93ac64d484551d61e559e1ba0cf4a15b760e69d74c5b1d0549ff74f%40%3Cdev.beam.apache.org%3E

On Mon, Feb 10, 2020 at 3:55 PM Robert Bradshaw  wrote:

> With an eye towards cross-language (which includes cross-version)
> pipelines and services (specifically looking at Dataflow) supporting
> portable pipelines, there's been a desire to stabilize the portability
> protos. There are currently many cleanups we'd like to do [1] (some
> essential, others nice to have); are there others that people would
> like to see?
>
> Of course we would like it to be possible for the FnAPI and Beam
> itself to continue to evolve. Most of this can be handled by runners
> understanding various transform URNs, but not all. (An example that
> comes to mind is support for large iterables [2], or the requirement
> to observe and respect new fields on a PTransform or its payloads
> [3]). One proposal for this is to add capabilities and/or
> requirements. An environment (corresponding generally to an SDK) could
> adveritize various capabilities (as a list or map of URNs) which a
> runner can take advantage of without requiring all SDKs to support all
> features at the same time. For the other way around, we need a way of
> marking something that a runner must reject if it does not understand
> it. This could be a set of requirements (again, a list of map of URNs)
> that designate capabilities required to at least be understood by the
> runner to faithfully execute this pipeline. (These could be attached
> to a transform or the pipeline itself.) Do these sound like reasonable
> additions? Also, would they ever need to be parameterized (map), or
> would a list suffice?
>
> [1] BEAM-2645, BEAM-2822, BEAM-3203, BEAM-3221, BEAM-3223, BEAM-3227,
> BEAM-3576, BEAM-3577, BEAM-3595, BEAM-4150, BEAM-4180, BEAM-4374,
> BEAM-5391, BEAM-5649, BEAM-8172, BEAM-8201, BEAM-8271, BEAM-8373,
> BEAM-8539, BEAM-8804, BEAM-9229, BEAM-9262, BEAM-9266, and BEAM-9272
> [2]
> https://lists.apache.org/thread.html/70cac361b659516933c505b513d43986c25c13da59eabfd28457f1f2@%3Cdev.beam.apache.org%3E
> [3]
> https://lists.apache.org/thread.html/rdc57f240069c0807eae87ed2ff13d3ee503bc18e5f906d05624e6433%40%3Cdev.beam.apache.org%3E
>


Re: Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-10 Thread Udi Meiri
I don't have those issues (running on Linux), but a possible workaround
could be to remove the "-j 8" flags (2 locations) in generate_pydoc.sh.


On Mon, Feb 10, 2020 at 11:06 AM Shoaib Zafar 
wrote:

> Hello Beamers.
>
> Just curious does anyone having trouble running
> ':sdks:python:test-suites:tox:pycommon:docs' command locally?
>
> After rebasing with master recently, I am facing sphinx thread fork error
> with on my macos mojave, using python 3.7.0.
> I Tried to add system variable "export
> OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google) but no
> luck!
>
> Any suggestions/help?
>
> Thanks!
>
> Console Log:
> --
> 
> Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
> Creating file target/docs/source/apache_beam.utils.retry.rst.
> Creating file target/docs/source/apache_beam.utils.subprocess_server.rst.
> Creating file
> target/docs/source/apache_beam.utils.thread_pool_executor.rst.
> Creating file target/docs/source/apache_beam.utils.timestamp.rst.
> Creating file target/docs/source/apache_beam.utils.urns.rst.
> Creating file target/docs/source/apache_beam.utils.rst.
> objc[8384]: +[__NSCFConstantString initialize] may have been in progress
> in another thread when fork() was called.
> objc[8384]: +[__NSCFConstantString initialize] may have been in progress
> in another thread when fork() was called. We cannot safely call it or
> ignore it in the fork() child process. Crashing instead. Set a breakpoint
> on objc_initializeAfterForkError to debug.
>
> Traceback (most recent call last):
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
> line 304, in build_main
> app.build(args.force_all, filenames)
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
> line 335, in build
> self.builder.build_all()
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 305, in build_all
> self.build(None, summary=__('all source files'), method='all')
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 360, in build
> updated_docnames = set(self.read())
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 466, in read
> self._read_parallel(docnames, nproc=self.app.parallel)
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
> line 521, in _read_parallel
> tasks.join()
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
> line 114, in join
> self._join_one()
>   File
> "/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
> line 120, in _join_one
> exc, logs, result = pipe.recv()
>   File
> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
> line 250, in recv
> buf = self._recv_bytes()
>   File
> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
> line 407, in _recv_bytes
> buf = self._recv(4)
>   File
> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
> line 383, in _recv
> raise EOFError
> EOFError
>
> Exception occurred:
>   File
> "/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
> line 383, in _recv
> raise EOFError
> EOFError
> The full traceback has been saved in
> /Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/tmp/sphinx-err-mphtfnei.log,
> if you want to report the issue to the developers.
> Please also report this if it was a user error, so that a better error
> message can be provided next time.
> A bug report can be filed in the tracker at <
> https://github.com/sphinx-doc/sphinx/issues>. Thanks!
> objc[8385]: +[__NSCFConstantString initialize] may have been in progress
> in another thread when fork() was called.
> objc[8385]: +[__NSCFConstantString initialize] may have been in progress
> in another thread when fork() was called. 

contributor permission for Beam Jira tickets

2020-02-10 Thread Yixing Zhang
Hi,

This is Yixing from LinkedIn. I'm closely working with Xinyu on Samza
runner. Can someone add me as a contributor for Beam's Jira issue tracker?
I would like to create/assign tickets for my work.

jira username: YixingZhang

Thanks,
Yixing Zhang


Re: [DISCUSS] BIP reloaded

2020-02-10 Thread Kenneth Knowles
I jumped into these wiki pages and figured out how Airflow did theirs using
the Page Properties table on each BIP [1] and how these automatically
update the index using the Page Properties Report [2]. I would consider
creating BIPs for ongoing efforts to flesh out these table, to establish
the columns that matter for each phase of a BIP.

Kenn

[1]
https://confluence.atlassian.com/conf71/page-properties-macro-979423418.html
[2]
https://confluence.atlassian.com/conf71/page-properties-report-macro-979423430.html

On Mon, Feb 10, 2020 at 12:57 AM Jan Lukavský  wrote:

> Hi Alex,
>
> because it would be super cool, to create a template from the BIP, I'd
> suggest a few minor changes:
>
>  - can we make motivation, current state, alternatives and implementation
> the same level headings?
>
>  - regarding the ordering - in my point of view it makes sense to first
> define problem (motivation + current state), then to elaborate on _all_
> options we have to solve the defined problem and then to make a choice
> (that would be motivation -> current state -> implementation options ->
> choice on an option). But I agree that once the section is called
> 'alternatives' (maybe even 'rejected alternatives') it makes more sense to
> have it _after_ the choice. But the naming might be just a matter of taste,
> so this might be sorted out later.
>
>  - a small fact note - because the BIP should make people ideally involved
> in voting process, it should be as explanatory as possible - I'm not
> feeling to be expert on schemas, so it would help me a little more context
> and maybe example of the "rejected alternatives", how it would look like,
> so that one can make a decision even when not being involved with schema on
> a daily basis. Your explanation is probably well understood by people who
> are experts in the area, but maybe might somewhat limit the audience.
>
> What do you think?
>
>  Jan
> On 2/9/20 9:19 PM, Alex Van Boxel wrote:
>
> a = motivation
> b => *added current state in Beam*
> c = alternatives
> d = implementation *(I prefer this to define before the alternatives)*
> e = *rest of document?*
>
>  _/
> _/ Alex Van Boxel
>
>
> On Sun, Feb 9, 2020 at 7:50 PM Jan Lukavský  wrote:
>
>> It's absolutely fine. :-) I think that the scope and quality of your
>> document suits very well for the first BIP.
>>
>> What I would find generally useful is a general structure that would be
>> something like:
>>
>>  a) definition of the problem
>>
>>  b) explanation why current Beam options don't fit well for the problem
>> defined at a)
>>
>>  c) ideally exhaustive list of possible solutions
>>
>>  d) choose of an option from c) with justification of the choice
>>
>>  e) implementation notes specific to the choice in d)
>>
>> I find mostly the point d) essential, because that can be used as a base
>> for vote (that is, if the community agrees that the list of options is
>> exhaustive and that the chosen solution is the best one possible) for
>> promoting a BIP from proposed to accepted.
>>
>> Does that make sense in your case?
>>
>>  Jan
>> On 2/9/20 7:08 PM, Alex Van Boxel wrote:
>>
>> I'm sorry, I stole the number 1 from you. Feel free to give suggestions
>> on the form, so we can get a good template for further BIPs
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Sun, Feb 9, 2020 at 6:43 PM Jan Lukavský  wrote:
>>
>>> Hi Alex,
>>>
>>> this is cool! Thanks for pushing this topic forward!
>>>
>>> Jan
>>> On 2/9/20 6:36 PM, Alex Van Boxel wrote:
>>>
>>> BIP-1 is available here:
>>> https://cwiki.apache.org/confluence/display/BEAM/%5BBIP-1%5D+Beam+Schema+Options
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Sat, Feb 1, 2020 at 9:11 PM Kenneth Knowles  wrote:
>>>
 Sounds great. If you scrape recent dev@ for proposals that are not yet
 implemented, I think you will find some, and you could ask them to add as a
 BIP if they are still interested.

 Kenn

 On Sat, Feb 1, 2020 at 1:11 AM Jan Lukavský  wrote:

> Hi Kenn,
>
> yes, I can do that. I think that there should be at least one first
> BIP, I can try to setup one. But (as opposed to my previous proposal) I'll
> try to setup a fresh one, not the one of [BEAM-8550], because that one
> already has a PR and rebasing the PR on master for such a long period (and
> it is likely, that final polishing of the BIP process will take several
> more months) starts to be costly. I have in mind two fresh candidates, so
> I'll pick one of them. I think that only setuping a cwiki would not start
> the process, we need a real-life example of a BIP included in that.
>
> Does that sound ok?
>
>  Jan
> On 2/1/20 5:55 AM, Kenneth Knowles wrote:
>
> These stages sound like a great starting point to me. Would you be the
> volunteer to set up a cwiki page for BIPs?
>
> Kenn
>
> On Mon, Jan 20, 2020 at 3:30 AM Jan Lukavský  wrote:
>
>> I agree that we can take inspiration from 

FnAPI proto backwards compatibility

2020-02-10 Thread Robert Bradshaw
With an eye towards cross-language (which includes cross-version)
pipelines and services (specifically looking at Dataflow) supporting
portable pipelines, there's been a desire to stabilize the portability
protos. There are currently many cleanups we'd like to do [1] (some
essential, others nice to have); are there others that people would
like to see?

Of course we would like it to be possible for the FnAPI and Beam
itself to continue to evolve. Most of this can be handled by runners
understanding various transform URNs, but not all. (An example that
comes to mind is support for large iterables [2], or the requirement
to observe and respect new fields on a PTransform or its payloads
[3]). One proposal for this is to add capabilities and/or
requirements. An environment (corresponding generally to an SDK) could
adveritize various capabilities (as a list or map of URNs) which a
runner can take advantage of without requiring all SDKs to support all
features at the same time. For the other way around, we need a way of
marking something that a runner must reject if it does not understand
it. This could be a set of requirements (again, a list of map of URNs)
that designate capabilities required to at least be understood by the
runner to faithfully execute this pipeline. (These could be attached
to a transform or the pipeline itself.) Do these sound like reasonable
additions? Also, would they ever need to be parameterized (map), or
would a list suffice?

[1] BEAM-2645, BEAM-2822, BEAM-3203, BEAM-3221, BEAM-3223, BEAM-3227,
BEAM-3576, BEAM-3577, BEAM-3595, BEAM-4150, BEAM-4180, BEAM-4374,
BEAM-5391, BEAM-5649, BEAM-8172, BEAM-8201, BEAM-8271, BEAM-8373,
BEAM-8539, BEAM-8804, BEAM-9229, BEAM-9262, BEAM-9266, and BEAM-9272
[2] 
https://lists.apache.org/thread.html/70cac361b659516933c505b513d43986c25c13da59eabfd28457f1f2@%3Cdev.beam.apache.org%3E
[3] 
https://lists.apache.org/thread.html/rdc57f240069c0807eae87ed2ff13d3ee503bc18e5f906d05624e6433%40%3Cdev.beam.apache.org%3E


Re: Upgrades gcsio to 2.0.0

2020-02-10 Thread Luke Cwik
What prevents the usage of the newer version of Guava?

On Mon, Feb 10, 2020 at 2:28 PM Esun Kim  wrote:

> Hi Beam Developers,
>
> I'm working on pr/10769  which
> upgrades gcsio from 1.9.16 to 2.0.0 which is an intermediate step to get us
> to use gcsio 2.x which supports gRPC, which potentially gives us better
> performance. (FYI, gcsio is a driver for Google Cloud Storage.)
>
> Link-check was run over this PR (result
> ) and
> it appears that it has a couple of linker warnings from following modules
> because this uses a newer version of guava.
>
>- com.google.cloud.hadoop.gcsio.cooplock.CoopLockRecordsDao
>(gcsio-2.0.0.jar)
>- com.google.cloud.hadoop.gcsio.cooplock.CoopLockOperationDao
>(gcsio-2.0.0.jar)
>- com.google.cloud.hadoop.gcsio.testing.InMemoryObjectEntry
>(gcsio-2.0.0.jar)
>
> But I believe that none of these is not actually problematic because
> cooplock is only for Hadoop (not for Beam) and testing is just testing. So
> I think it's okay to get this merged but I want to get an opinion on this
> from you.
>
> Regards,
> Esun.
>
>


Upgrades gcsio to 2.0.0

2020-02-10 Thread Esun Kim
Hi Beam Developers,

I'm working on pr/10769  which
upgrades gcsio from 1.9.16 to 2.0.0 which is an intermediate step to get us
to use gcsio 2.x which supports gRPC, which potentially gives us better
performance. (FYI, gcsio is a driver for Google Cloud Storage.)

Link-check was run over this PR (result
) and it
appears that it has a couple of linker warnings from following modules
because this uses a newer version of guava.

   - com.google.cloud.hadoop.gcsio.cooplock.CoopLockRecordsDao
   (gcsio-2.0.0.jar)
   - com.google.cloud.hadoop.gcsio.cooplock.CoopLockOperationDao
   (gcsio-2.0.0.jar)
   - com.google.cloud.hadoop.gcsio.testing.InMemoryObjectEntry
   (gcsio-2.0.0.jar)

But I believe that none of these is not actually problematic because
cooplock is only for Hadoop (not for Beam) and testing is just testing. So
I think it's okay to get this merged but I want to get an opinion on this
from you.

Regards,
Esun.


Re: BEAM-8758: Code Review Wanted for PR 10765

2020-02-10 Thread Luke Cwik
I took a look, left relevant comments.

On Mon, Feb 10, 2020 at 12:26 PM Tomo Suzuki  wrote:

> Hi Udi, Luke, and Beam committers,
>
> Would you review/merge this google-cloud-spanner dependency upgrade?
> https://github.com/apache/beam/pull/10765
>
> --
> Regards,
> Tomo
>


BEAM-8758: Code Review Wanted for PR 10765

2020-02-10 Thread Tomo Suzuki
Hi Udi, Luke, and Beam committers,

Would you review/merge this google-cloud-spanner dependency upgrade?
https://github.com/apache/beam/pull/10765

-- 
Regards,
Tomo


Re: Dynamic timers now supported!

2020-02-10 Thread Kenneth Knowles
I think the (lack of) portability bit may have been buried in this thread.
Maybe a new thread about the design for that?

Kenn

On Sun, Feb 9, 2020 at 11:36 AM Reuven Lax  wrote:

> FYI, this is now fixed for Dataflow. I also added better rejection so that
> runners that don't support this feature will reject the pipeline.
>
> On Sat, Feb 8, 2020 at 12:10 AM Reuven Lax  wrote:
>
>> I took a look, and I think this was a simple bug. Testing a fix now.
>>
>> A larger question is how to support this in the portability layer. Right
>> now portability assumes that each timer id corresponds to a logical input
>> PCollection, but that assumption no longer works as we now support a
>> dynamic set of timers, each with their own id. We could instead model each
>> timer family as a PColleciton, but the FnApiRunner would need to
>> dynamically get the timer id in order to invoke it, and today it statically
>> reads the timer id from the PCollection name.
>>
>> Reuven
>>
>> On Fri, Feb 7, 2020 at 2:22 PM Reuven Lax  wrote:
>>
>>> Thanks for finding this. Hopefully the bug is easy .to fix. The tests
>>> indeed never ran on any runner except for the DirectRunner, which is
>>> something I should've noticed in the code review.
>>>
>>> Reuven
>>>
>>> On Mon, Feb 3, 2020 at 12:50 AM Ismaël Mejía  wrote:
>>>
 I had a discussion with Rehman last week and we discovered that the
 TimersMap
 related tests were not running for all runners because they were not
 tagged as
 part of the ValidatesRunner category. I opened a PR [1] to enable this,
 so
 please someone help me with the review/merge.

 I took a look just for curiosity and discovered that they are only
 passing for
 Direct runner and for the classic Flink runner in batch mode. They are
 not
 passing for Dataflow [2][3] and for the Portable Flink runner, so
 probably worth
 to reopen the issue to investigate/fix.

 [1] https://github.com/apache/beam/pull/10747
 [2]
 https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/210/
 [3]
 https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_PR/76/


 On Sat, Jan 25, 2020 at 1:26 AM Reuven Lax  wrote:

> Yes. For now we exclude the flink runner, but fixing this should be
> fairly trivial.
>
> On Fri, Jan 24, 2020 at 3:35 PM Maximilian Michels 
> wrote:
>
>> The Flink Runner was allowing to set a timer multiple times before we
>> made it comply with the Beam semantics of overwriting past
>> invocations.
>> I wouldn't be surprised if the Spark Runner never addressed this.
>> Flink
>> and Spark itself allow for a timer to be set to multiple times. In
>> order
>> to fix this for Beam, the Flink Runner has to maintain a checkpointed
>> map which sits outside of its builtin TimerService.
>>
>> As far as I can see, multiple timer families are currently not
>> supported
>> in the Flink Runner due to the map not taking the family name into
>> account. This can be easily fixed though.
>>
>> -Max
>>
>> On 24.01.20 21:31, Reuven Lax wrote:
>> > The new timer family is in the portability protos. I think
>> TimerReceiver
>> > needs to be updated to set it though (I think a 1-line change).
>> >
>> > The TimerInternals class that runners implement today already
>> handles
>> > dynamic timers, so most of the work was in the Beam SDK  to provide
>> an
>> > API that allows users to access this feature.
>> >
>> > The main work needed in the runner was to take in account the timer
>> > family. Beam semantics say that if a timer is set twice with the
>> same
>> > id, then the second timer overwrites the first.  Several runners
>> > therefore had maps from timer id -> timer. However since the
>> > timer family scopes the timers, we now allow two timers with the
>> same id
>> > as long as the timer families are different. Runners had to be
>> updated
>> > to include the timer family id in the map keys.
>> >
>> > Surprisingly, the new TimerMap tests seem to pass on Spark
>> > ValidatesRunner, even though the Spark runner wasn't updated! I
>> wonder
>> > if this means that the Spark runner was incorrectly implementing
>> the
>> > Beam semantics before, and setTimer was not overwriting timers with
>> the
>> > same id?
>> >
>> > Reuven
>> >
>> > On Fri, Jan 24, 2020 at 7:31 AM Ismaël Mejía > > > wrote:
>> >
>> > This looks great, thanks for the contribution Rehman!
>> >
>> > I have some questions (note I have not looked at the code at
>> all).
>> >
>> > - Is this working for both portable and non portable runners?
>> > - What do other runners need to implement to support this (e.g.
>> Spark)?

Sphinx Docs Command Error (:sdks:python:test-suites:tox:pycommon:docs)

2020-02-10 Thread Shoaib Zafar
Hello Beamers.

Just curious does anyone having trouble running
':sdks:python:test-suites:tox:pycommon:docs' command locally?

After rebasing with master recently, I am facing sphinx thread fork error
with on my macos mojave, using python 3.7.0.
I Tried to add system variable "export
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" (which I found on google) but no
luck!

Any suggestions/help?

Thanks!

Console Log:
--

Creating file target/docs/source/apache_beam.utils.proto_utils.rst.
Creating file target/docs/source/apache_beam.utils.retry.rst.
Creating file target/docs/source/apache_beam.utils.subprocess_server.rst.
Creating file target/docs/source/apache_beam.utils.thread_pool_executor.rst.
Creating file target/docs/source/apache_beam.utils.timestamp.rst.
Creating file target/docs/source/apache_beam.utils.urns.rst.
Creating file target/docs/source/apache_beam.utils.rst.
objc[8384]: +[__NSCFConstantString initialize] may have been in progress in
another thread when fork() was called.
objc[8384]: +[__NSCFConstantString initialize] may have been in progress in
another thread when fork() was called. We cannot safely call it or ignore
it in the fork() child process. Crashing instead. Set a breakpoint on
objc_initializeAfterForkError to debug.

Traceback (most recent call last):
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/cmd/build.py",
line 304, in build_main
app.build(args.force_all, filenames)
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/application.py",
line 335, in build
self.builder.build_all()
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
line 305, in build_all
self.build(None, summary=__('all source files'), method='all')
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
line 360, in build
updated_docnames = set(self.read())
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
line 466, in read
self._read_parallel(docnames, nproc=self.app.parallel)
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/builders/__init__.py",
line 521, in _read_parallel
tasks.join()
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
line 114, in join
self._join_one()
  File
"/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/lib/python3.7/site-packages/sphinx/util/parallel.py",
line 120, in _join_one
exc, logs, result = pipe.recv()
  File
"/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
line 250, in recv
buf = self._recv_bytes()
  File
"/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
line 407, in _recv_bytes
buf = self._recv(4)
  File
"/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
line 383, in _recv
raise EOFError
EOFError

Exception occurred:
  File
"/Users/shoaib/.pyenv/versions/3.7.0/lib/python3.7/multiprocessing/connection.py",
line 383, in _recv
raise EOFError
EOFError
The full traceback has been saved in
/Users/shoaib/Projects/beam/newbeam/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/target/.tox-py37-docs/py37-docs/tmp/sphinx-err-mphtfnei.log,
if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error
message can be provided next time.
A bug report can be filed in the tracker at <
https://github.com/sphinx-doc/sphinx/issues>. Thanks!
objc[8385]: +[__NSCFConstantString initialize] may have been in progress in
another thread when fork() was called.
objc[8385]: +[__NSCFConstantString initialize] may have been in progress in
another thread when fork() was called. We cannot safely call it or ignore
it in the fork() child process. Crashing instead. Set a breakpoint on
objc_initializeAfterForkError to debug.
objc[8386]: +[__NSCFConstantString initialize] may have been in progress in
another thread when fork() was called.
objc[8386]: +[__NSCFConstantString initialize] may have been in progress in
another thread when fork() was called. We cannot 

Re: A new reworked Elasticsearch 7+ IO module

2020-02-10 Thread Chamikara Jayalath
On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot 
wrote:

> Hi,
>
> please see my comments inline
> On 06/02/2020 16:24, Alexey Romanenko wrote:
>
> Please, see my comments inline.
>
> On 6 Feb 2020, at 10:50, Etienne Chauchot  wrote:
>
> 1. regarding version support: ES v2 is no more maintained by Elastic since
>>> 2018/02 so we plan to remove it from the IO. In the past we already retired
>>> versions (like spark 1.6 for instance).
>>>
>>>
>> My only concern here is that there might be users who use the existing
>> module who might not be able to easily upgrade the Beam version if we
>> remove it. But given that V2 is 5 versions behind the latest release this
>> might be OK.
>>
>
> It seems we have a consensus on this.
> I think there should be another general discussion on the long term
> support of our prefered tool IO modules.
>
> => yes, consensus, let's drop ESV2
>
> We had (and still have) a similar problem with KafkaIO to support
> different versions of Kafka, especially very old version 0.9. We raised
> this question on user@ and it appears that there are users who for some
> reasons still use old Kafka versions. So, before dropping a support of any
> ES versions, I’d suggest to ask it user@ and see if any people will be
> affected by this.
>
> Yes we can do a survey among users but the question is, should we support
> an ES version that is no more supported by Elastic themselves ?
>

+1 for asking in the user list. I guess this is more about whether users
need this specific version that we hope to drop support for. Whether we
need to support unsupported versions is a more generic question that should
prob. be addressed in the dev list. (and I personally don't think we should
unless there's a large enough user base for a given version).

2. regarding the user: the aim is to unlock some new features (listed by
>>> Ludovic) and give the user more flexibility on his request. For that, it
>>> requires to use high level java ES client in place of the low level REST
>>> client (that was used because it is the only one compatible with all ES
>>> versions). We plan to replace the API (json document in and out) by more
>>> complete standard ES objects that contain de request logic (insert/update,
>>> doc routing etc...) and the data. There are already IOs like SpannerIO that
>>> use similar objects in input PCollection rather than pure POJOs.
>>>
>>>
>> Won't this be a breaking change for all users ? IMO using POJOs in
>> PCollections is safer since we have to worry about changes to the
>> underlying client library API. Exception would be when underlying client
>> library offers a backwards compatibility guarantee that we can rely on for
>> the foreseeable future (for example, BQ TableRow).
>>
>
> Agreed but actually, there will be POJOs in order to abstract
> Elasticsearch's version support. The following third point explains this.
>
> => indeed it will be a breaking change, hence this email to get a
> consensus on that. Also I think our wrappers of ES request objects will
> offer a backward compatible as the underlying objects
>
> I just want to remind that according to what we agreed some time ago on
> dev@ (at least, for IOs), all breaking user API changes have to be added
> along with deprecation of old API that could be removed after 3 consecutive
> Beam releases. In this case, users will have a time to move to new API
> smoothly.
>
> We are more discussing the target architecture of the new module here but
> the process of deprecation is important to recall, I agree. When I say DTOs
> backward compatible above I mean between per-version sub-modules inside the
> new module. Anyway, sure, for some time, both modules (the old REST-based
> that supports v2-7 and the new that supports v5-7) will cohabit and the old
> one will receive the deprecation annotations.
>

+1 for supporting both versions for at least three minor versions to give
users time to migrate. Also, we should try to produce a warning for users
who use the deprecated versions.

Thanks,
Cham


> Best
>
> Etienne
>
>
>
>


Re: Retest this please access?

2020-02-10 Thread Robert Bradshaw
We're working on that, follow https://issues.apache.org/jira/browse/INFRA-19670

On Mon, Feb 10, 2020 at 9:52 AM Daniel Collins  wrote:
>
> Hello all,
>
> I'm feeling a bit bad about asking my reviewers to re-run presubmits. How 
> would I go about getting access to "Retest this please" being interpreted as 
> a presubmit trigger on github?
>
> Thanks,
>
> Daniel


Re: Labels on PR

2020-02-10 Thread Udi Meiri
Cool!

On Mon, Feb 10, 2020 at 9:27 AM Robert Burke  wrote:

> +1 to autolabeling
>
> On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:
>
>> Nice
>>
>> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel  wrote:
>>
>>> Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
>>> something I've looked at... I'll dive into that.
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía  wrote:
>>>
 +1

 You don't need to write your own action, there is already one
 autolabeler action [1].
 INFRA can easily configure it for Beam (as they did for Avro [2]) if
 we request it.
 The plugin is quite easy to configure and works like a charm [3].

 [1] https://github.com/probot/autolabeler
 [1] https://issues.apache.org/jira/browse/INFRA-17367
 [2] https://github.com/apache/avro/blob/master/.github/autolabeler.yml


 On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Great initiative, thanks Alex! I was thinking to add such labels into
> PR title but I believe that GitHub labels are better since it can be used
> easily for filtering, for example.
>
> Maybe it could be useful to add more granulation for labels, like
> “release”, “runners”, “website”, etc but I’m afraid to make the titles too
> heavy because of this.
>
> > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
> >
> > I've started putting labels on PR's. I've done the first page for
> now (as I'm afraid putting them on older once could affect the stale bot. 
> I
> hope this is ok.
> >
> > For now I'm only focussing on language and I'm going to see if I can
> write a GitLab action for it. I hope this is useful. Other kind of
> suggestions for labels, that can be automated, are welcome.
> >
> > 
> >  _/
> > _/ Alex Van Boxel
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Retest this please access?

2020-02-10 Thread Daniel Collins
Hello all,

I'm feeling a bit bad about asking my reviewers to re-run presubmits. How
would I go about getting access to "Retest this please" being interpreted
as a presubmit trigger on github?

Thanks,

Daniel


Re: Labels on PR

2020-02-10 Thread Robert Burke
+1 to autolabeling

On Mon, Feb 10, 2020, 9:21 AM Luke Cwik  wrote:

> Nice
>
> On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel  wrote:
>
>> Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
>> something I've looked at... I'll dive into that.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía  wrote:
>>
>>> +1
>>>
>>> You don't need to write your own action, there is already one
>>> autolabeler action [1].
>>> INFRA can easily configure it for Beam (as they did for Avro [2]) if we
>>> request it.
>>> The plugin is quite easy to configure and works like a charm [3].
>>>
>>> [1] https://github.com/probot/autolabeler
>>> [1] https://issues.apache.org/jira/browse/INFRA-17367
>>> [2] https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>>>
>>>
>>> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 Great initiative, thanks Alex! I was thinking to add such labels into
 PR title but I believe that GitHub labels are better since it can be used
 easily for filtering, for example.

 Maybe it could be useful to add more granulation for labels, like
 “release”, “runners”, “website”, etc but I’m afraid to make the titles too
 heavy because of this.

 > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
 >
 > I've started putting labels on PR's. I've done the first page for now
 (as I'm afraid putting them on older once could affect the stale bot. I
 hope this is ok.
 >
 > For now I'm only focussing on language and I'm going to see if I can
 write a GitLab action for it. I hope this is useful. Other kind of
 suggestions for labels, that can be automated, are welcome.
 >
 > 
 >  _/
 > _/ Alex Van Boxel




Re: Labels on PR

2020-02-10 Thread Luke Cwik
Nice

On Mon, Feb 10, 2020 at 2:52 AM Alex Van Boxel  wrote:

> Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
> something I've looked at... I'll dive into that.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía  wrote:
>
>> +1
>>
>> You don't need to write your own action, there is already one
>> autolabeler action [1].
>> INFRA can easily configure it for Beam (as they did for Avro [2]) if we
>> request it.
>> The plugin is quite easy to configure and works like a charm [3].
>>
>> [1] https://github.com/probot/autolabeler
>> [1] https://issues.apache.org/jira/browse/INFRA-17367
>> [2] https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>>
>>
>> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Great initiative, thanks Alex! I was thinking to add such labels into PR
>>> title but I believe that GitHub labels are better since it can be used
>>> easily for filtering, for example.
>>>
>>> Maybe it could be useful to add more granulation for labels, like
>>> “release”, “runners”, “website”, etc but I’m afraid to make the titles too
>>> heavy because of this.
>>>
>>> > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
>>> >
>>> > I've started putting labels on PR's. I've done the first page for now
>>> (as I'm afraid putting them on older once could affect the stale bot. I
>>> hope this is ok.
>>> >
>>> > For now I'm only focussing on language and I'm going to see if I can
>>> write a GitLab action for it. I hope this is useful. Other kind of
>>> suggestions for labels, that can be automated, are welcome.
>>> >
>>> > 
>>> >  _/
>>> > _/ Alex Van Boxel
>>>
>>>


Beam Dependency Check Report (2020-02-10)

2020-02-10 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
cachetools
3.1.1
4.0.0
2019-12-23
2019-12-23BEAM-9017
google-cloud-bigquery
1.17.1
1.24.0
2019-09-23
2020-02-10BEAM-5537
google-cloud-datastore
1.7.4
1.10.0
2019-05-27
2019-10-21BEAM-8443
httplib2
0.12.0
0.17.0
2018-12-10
2020-01-27BEAM-9018
mock
2.0.0
3.0.5
2019-05-20
2019-05-20BEAM-7369
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
PyHamcrest
1.10.1
2.0.0
2020-01-20
2020-01-20BEAM-9155
pytest
4.6.9
5.3.5
2020-01-06
2020-02-03BEAM-8606
Sphinx
1.8.5
2.4.0
2019-05-20
2020-02-10BEAM-7370
tenacity
5.1.5
6.0.0
2019-11-11
2019-11-11BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.alibaba:fastjson
1.2.49
1.2.62
2018-08-04
2019-10-07BEAM-8632
com.datastax.cassandra:cassandra-driver-core
3.8.0
4.0.0
2019-10-29
2019-03-18BEAM-8674
com.esotericsoftware:kryo
4.0.2
5.0.0-RC4
2018-03-20
2019-04-14BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.20.0
0.27.0
2019-02-11
2019-10-21BEAM-6645
com.github.luben:zstd-jni
1.3.8-3
1.4.4-7
2019-01-29
2020-01-24BEAM-9194
com.github.spotbugs:spotbugs
3.1.12
4.0.0-RC3
2019-03-01
2020-02-08BEAM-7792
com.github.spotbugs:spotbugs-annotations
3.1.12
4.0.0-RC3
2019-03-01
2020-02-08BEAM-6951
com.google.api.grpc:grpc-google-cloud-datacatalog-v1beta1
0.29.0-alpha
0.32.1
2019-10-25
2020-02-04BEAM-8853
com.google.api.grpc:grpc-google-common-protos
1.12.0
1.17.0
2018-06-29
2019-10-04BEAM-8633
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta1
0.85.1
0.89.0
2020-01-08
2020-02-07BEAM-8678
com.google.api.grpc:proto-google-cloud-datacatalog-v1beta1
0.29.0-alpha
0.32.1
2019-10-25
2020-02-04BEAM-8854
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1
1.6.0
1.49.2
2019-01-23
2020-02-06BEAM-8682
com.google.api.grpc:proto-google-common-protos
1.12.0
1.17.0
2018-06-29
2019-10-04BEAM-6899
com.google.apis:google-api-services-bigquery
v2-rev20190917-1.30.3
v2-rev20191211-1.30.8
2019-10-09
2020-02-08BEAM-8684
com.google.apis:google-api-services-clouddebugger
v2-rev20191003-1.30.3
v2-rev20200102-1.30.8
2019-10-19
2020-02-09BEAM-8750
com.google.apis:google-api-services-cloudresourcemanager
v1-rev20191206-1.30.3
v2-rev20191206-1.30.3
2019-12-17
2019-12-17BEAM-8751
com.google.apis:google-api-services-dataflow
v1b3-rev20190927-1.30.3
v1beta3-rev12-1.20.0
2019-10-11
2015-04-29BEAM-8752
com.google.apis:google-api-services-pubsub
v1-rev2019-1.30.3
v1-rev20191203-1.30.3
2019-11-26
2019-12-18BEAM-8753
com.google.cloud:google-cloud-bigquery
1.103.0
1.106.0
2020-01-08
2020-02-03BEAM-8687
com.google.cloud:google-cloud-bigquerystorage
0.120.1-beta
0.124.0-beta
2020-01-08
2020-02-07BEAM-8755
com.google.cloud:google-cloud-spanner
1.6.0
1.49.2
2019-01-23
2020-02-06BEAM-8758
com.google.cloud.bigdataoss:gcsio
1.9.16
2.0.0
2019-02-25
2019-08-23BEAM-8689
com.google.cloud.bigdataoss:util
1.9.16
2.0.0
2019-02-25
2019-08-23BEAM-8759
com.google.errorprone:error_prone_annotations
2.0.15
2.2.0
2016-12-02
2018-01-08BEAM-6741
com.google.guava:guava
26.0-jre
28.2-jre
2018-08-01
2019-12-27BEAM-5559
com.google.guava:guava-testlib
25.1-jre
 

Re: Labels on PR

2020-02-10 Thread Alex Van Boxel
Ha, cool. I'll have a look at the autolabeler. The infra stuff is not
something I've looked at... I'll dive into that.

 _/
_/ Alex Van Boxel


On Mon, Feb 10, 2020 at 11:49 AM Ismaël Mejía  wrote:

> +1
>
> You don't need to write your own action, there is already one autolabeler
> action [1].
> INFRA can easily configure it for Beam (as they did for Avro [2]) if we
> request it.
> The plugin is quite easy to configure and works like a charm [3].
>
> [1] https://github.com/probot/autolabeler
> [1] https://issues.apache.org/jira/browse/INFRA-17367
> [2] https://github.com/apache/avro/blob/master/.github/autolabeler.yml
>
>
> On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Great initiative, thanks Alex! I was thinking to add such labels into PR
>> title but I believe that GitHub labels are better since it can be used
>> easily for filtering, for example.
>>
>> Maybe it could be useful to add more granulation for labels, like
>> “release”, “runners”, “website”, etc but I’m afraid to make the titles too
>> heavy because of this.
>>
>> > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
>> >
>> > I've started putting labels on PR's. I've done the first page for now
>> (as I'm afraid putting them on older once could affect the stale bot. I
>> hope this is ok.
>> >
>> > For now I'm only focussing on language and I'm going to see if I can
>> write a GitLab action for it. I hope this is useful. Other kind of
>> suggestions for labels, that can be automated, are welcome.
>> >
>> > 
>> >  _/
>> > _/ Alex Van Boxel
>>
>>


Re: Labels on PR

2020-02-10 Thread Ismaël Mejía
+1

You don't need to write your own action, there is already one autolabeler
action [1].
INFRA can easily configure it for Beam (as they did for Avro [2]) if we
request it.
The plugin is quite easy to configure and works like a charm [3].

[1] https://github.com/probot/autolabeler
[1] https://issues.apache.org/jira/browse/INFRA-17367
[2] https://github.com/apache/avro/blob/master/.github/autolabeler.yml


On Mon, Feb 10, 2020 at 11:20 AM Alexey Romanenko 
wrote:

> Great initiative, thanks Alex! I was thinking to add such labels into PR
> title but I believe that GitHub labels are better since it can be used
> easily for filtering, for example.
>
> Maybe it could be useful to add more granulation for labels, like
> “release”, “runners”, “website”, etc but I’m afraid to make the titles too
> heavy because of this.
>
> > On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
> >
> > I've started putting labels on PR's. I've done the first page for now
> (as I'm afraid putting them on older once could affect the stale bot. I
> hope this is ok.
> >
> > For now I'm only focussing on language and I'm going to see if I can
> write a GitLab action for it. I hope this is useful. Other kind of
> suggestions for labels, that can be automated, are welcome.
> >
> > 
> >  _/
> > _/ Alex Van Boxel
>
>


Re: Labels on PR

2020-02-10 Thread Alexey Romanenko
Great initiative, thanks Alex! I was thinking to add such labels into PR title 
but I believe that GitHub labels are better since it can be used easily for 
filtering, for example.

Maybe it could be useful to add more granulation for labels, like “release”, 
“runners”, “website”, etc but I’m afraid to make the titles too heavy because 
of this.

> On 10 Feb 2020, at 08:35, Alex Van Boxel  wrote:
> 
> I've started putting labels on PR's. I've done the first page for now (as I'm 
> afraid putting them on older once could affect the stale bot. I hope this is 
> ok.
> 
> For now I'm only focussing on language and I'm going to see if I can write a 
> GitLab action for it. I hope this is useful. Other kind of suggestions for 
> labels, that can be automated, are welcome.
> 
> 
>  _/
> _/ Alex Van Boxel



Re: big data blog

2020-02-10 Thread Etienne Chauchot

Yes sure,

Here is the link to the spreadsheet for review of the tweet: 
https://docs.google.com/spreadsheets/d/1mz36njTtn1UJwDF50GdqyZVbX_F0n_A6eMYcxsktpSM/edit#gid=1413052381


thanks all for your encouragement !

Best

Etienne

On 08/02/2020 08:09, Kenneth Knowles wrote:
Nice! Yes, I think we should promote Beam articles that are insightful 
from a longtime contributor.


Etienne - can you add twitter announcements/retweets to the social 
media spreadsheet when you write new articles?


Kenn

On Fri, Feb 7, 2020 at 5:44 PM Ahmet Altay > wrote:


Cool, thank you. Would it make sense to promote Beam related posts
on our twitter channel?

On Fri, Feb 7, 2020 at 2:47 PM Pablo Estrada mailto:pabl...@google.com>> wrote:

Very nice. Thanks for sharing Etienne!

On Fri, Feb 7, 2020 at 2:19 PM Reuven Lax mailto:re...@google.com>> wrote:

Cool!

On Fri, Feb 7, 2020 at 7:24 AM Etienne Chauchot
mailto:echauc...@apache.org>> wrote:

Hi all,

FYI, I just started a blog around big data
technologies and for now it
is focused on Beam.

https://echauchot.blogspot.com/

Feel free to comment, suggest or anything.

Etienne