Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-12 Thread Jozef Vilcek
+1

On Fri, Dec 13, 2019 at 5:58 AM Kenneth Knowles  wrote:

> Please vote on the proposal for Beam's mascot to be the Firefly. This
> encompasses the Lampyridae family of insects, without specifying a genus or
> species.
>
> [ ] +1, Approve Firefly being the mascot
> [ ] -1, Disapprove Firefly being the mascot
>
> The vote will be open for at least 72 hours excluding weekends. It is
> adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
> votes*. Non-PMC votes are still encouraged.
>
> PMC voters, please help by indicating your vote as "(binding)"
>
> Kenn
>
> *I have chosen this format for this vote, even though Beam uses simple
> majority as a rule, because I want any PMC member to be able to veto based
> on concerns about overlap or trademark.
>


Re: Committed without review. Sorry.

2019-12-12 Thread Kenneth Knowles
Thanks for alerting dev@.

IMO this was handled perfectly by all parties.

Kenn

On Thu, Dec 12, 2019 at 4:09 PM Valentyn Tymofieiev 
wrote:

> The change LGTM, so you can consider it reviewed.  In general it would be
> nice to set up alerts to catch these situations, to make sure they don't go
> unnoticed.
>
> Also as a reminder - please don't commit or merge PRs into release
> branches without a review from a release manager.
>
> On Thu, Dec 12, 2019 at 3:44 PM Pablo Estrada  wrote:
>
>> Seed job runs okay:
>> https://builds.apache.org/job/beam_SeedJob_Standalone/3865/console
>>
>>
>> On Thu, Dec 12, 2019 at 3:28 PM Pablo Estrada  wrote:
>>
>>> I accidentally committed a small change to master:
>>> https://github.com/apache/beam/commit/6018326ffe74aac7d8c44ded296b92f8b5c0b556
>>>
>>> I am verifying that this works as intended for now.
>>>
>>> What should we do about this? Revert? Leave as is if it works fine?
>>> Best
>>> -P
>>>
>>


Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-12 Thread Boyuan Zhang
+1 (non binding)

On Thu, Dec 12, 2019 at 9:14 PM Robert Burke  wrote:

> +1 (non binding)
>
> On Thu, Dec 12, 2019, 8:58 PM Kenneth Knowles  wrote:
>
>> Please vote on the proposal for Beam's mascot to be the Firefly. This
>> encompasses the Lampyridae family of insects, without specifying a genus or
>> species.
>>
>> [ ] +1, Approve Firefly being the mascot
>> [ ] -1, Disapprove Firefly being the mascot
>>
>> The vote will be open for at least 72 hours excluding weekends. It is
>> adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
>> votes*. Non-PMC votes are still encouraged.
>>
>> PMC voters, please help by indicating your vote as "(binding)"
>>
>> Kenn
>>
>> *I have chosen this format for this vote, even though Beam uses simple
>> majority as a rule, because I want any PMC member to be able to veto based
>> on concerns about overlap or trademark.
>>
>


Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-12 Thread Robert Burke
+1 (non binding)

On Thu, Dec 12, 2019, 8:58 PM Kenneth Knowles  wrote:

> Please vote on the proposal for Beam's mascot to be the Firefly. This
> encompasses the Lampyridae family of insects, without specifying a genus or
> species.
>
> [ ] +1, Approve Firefly being the mascot
> [ ] -1, Disapprove Firefly being the mascot
>
> The vote will be open for at least 72 hours excluding weekends. It is
> adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
> votes*. Non-PMC votes are still encouraged.
>
> PMC voters, please help by indicating your vote as "(binding)"
>
> Kenn
>
> *I have chosen this format for this vote, even though Beam uses simple
> majority as a rule, because I want any PMC member to be able to veto based
> on concerns about overlap or trademark.
>


[VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-12 Thread Kenneth Knowles
Please vote on the proposal for Beam's mascot to be the Firefly. This
encompasses the Lampyridae family of insects, without specifying a genus or
species.

[ ] +1, Approve Firefly being the mascot
[ ] -1, Disapprove Firefly being the mascot

The vote will be open for at least 72 hours excluding weekends. It is
adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
votes*. Non-PMC votes are still encouraged.

PMC voters, please help by indicating your vote as "(binding)"

Kenn

*I have chosen this format for this vote, even though Beam uses simple
majority as a rule, because I want any PMC member to be able to veto based
on concerns about overlap or trademark.


Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-12-12 Thread Kenneth Knowles
Ah! I forgot to send the official ratification vote. I will do that now.

On Thu, Dec 12, 2019 at 3:51 PM Aizhamal Nurmamat kyzy 
wrote:

> Thank you, Kenn for running the vote. I reached out to a couple community
> members to see if they would like to develop the design and contribute it
> to Beam. I will keep you all posted. Thanks :)
>
> On Mon, Dec 2, 2019 at 8:20 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> I have tweaked Robert's python* and then applied three filters: All
>> voters, committers, and PMC.
>>
>> Summary:
>>
>>  - All voters (46): Firefly (but Owl close behind, no others close)
>>  - Committers (24): Owl (but Firefly close behind, no others close)
>>  - PMC (6): Cuttlefish (but a many-way tie close behind)
>>
>> It seems most of the PMC has decided to leave this decision to the
>> broader community. So in the spirit of community over code, and respecting
>> the exact outcome of the vote regardless of who showed up to actually vote,
>> I will open a final vote for Firefly.
>>
>> Kenn
>>
>> 
>>
>> All Voters:
>>
>> Firefly 21
>> Owl 19
>> Dumbo Octopus 11
>> Lemur 15
>> Salmon 5
>> Angler fish 12
>> Robot dinosaur 10
>> Capybara 2
>> Beaver 2
>> Trout 3
>> Cuttlefish 12
>> Honey Badger 1
>> Hedgehog 11
>>
>> 
>>
>> Committers:
>>
>> Firefly 13
>> Owl 14
>> Dumbo Octopus 7
>> Lemur 8
>> Salmon 1
>> Angler fish 4
>> Robot dinosaur 4
>> Capybara 0
>> Beaver 1
>> Trout 1
>> Cuttlefish 8
>> Honey Badger 1
>> Hedgehog 5
>>
>> 
>>
>> PMC:
>>
>> Firefly 2
>> Owl 2
>> Dumbo Octopus 0
>> Lemur 2
>> Salmon 0
>> Angler fish 2
>> Robot dinosaur 0
>> Capybara 0
>> Beaver 0
>> Trout 0
>> Cuttlefish 3
>> Honey Badger 0
>> Hedgehog 0
>>
>> 
>>
>> *import collections, pprint, re, requests, csv, sys
>> thread = requests.get('
>> https://lists.apache.org/api/thread.lua?id=ff60eabbf8349ba6951633869000356c2c2feb48bbff187cf3c60039@%3Cdev.beam.apache.org%3E').json(
>> )
>> counts = collections.defaultdict(lambda: collections.defaultdict(int))
>>
>> for email in thread['emails']:
>>   author = email['from']
>>   body = requests.get('https://lists.apache.org/api/email.lua?id=%s' %
>> email['mid']).json()['body']
>>   for vote in re.findall(r'\n\s*\[\s*[xX]\s*\]\s*([a-zA-Z ]+)', body):
>>counts[author][vote] = 1
>>   pprint.pprint(sorted(counts[author].items(), key=lambda kv: kv[-1]))
>>
>> candidates = set().union(*[counts[author].keys() for author in
>> counts.keys()])
>>
>> votewriter = csv.DictWriter(sys.stdout, ["author"] + list(candidates))
>> votewriter.writeheader()
>> for author, votecount in counts.items():
>>   votewriter.writerow(dict({"author": author.encode('utf-8')},
>> **votecount))
>>
>> On Mon, Nov 25, 2019 at 2:09 PM Mark Liu  wrote:
>>
>>> [ ] Beaver
>>> [ ] Hedgehog
>>> [ ] Lemur
>>> [ ] Owl
>>> [ ] Salmon
>>> [ ] Trout
>>> [ ] Robot dinosaur
>>> [ ] Firefly
>>> [ ] Cuttlefish
>>> [X] Dumbo Octopus
>>> [ ] Angler fish
>>>
>>> On Mon, Nov 25, 2019 at 1:22 PM David Cavazos 
>>> wrote:
>>>
 Hi Kenneth, I tried adding back the email addresses, but they weren't
 added on the existing responses, it would only add them on new ones. :(

 I've already made it not accept new responses.

 There are only 8 responses (2 mine, 1 my real vote and 1 empty test
 vote), so hopefully everyone who voted there can vote back here.

 On Sat, Nov 23, 2019 at 7:27 PM Kenneth Knowles 
 wrote:

> David - if you can reconfigure the form so it is not anonymous (at
> least to me) then I may be up for including those results in the tally. I
> don't want to penalize those who voted via the form. But since there are
> now two voting channels we have to dedupe or discard the form results. And
> I need to be able to see which votes are PMC. Even if advisory, it does
> need to move to a concluding vote, and PMC votes could be a tiebreaker
> of sorts.
>
> Kenn
>
> On Sat, Nov 23, 2019 at 7:17 PM Kenneth Knowles 
> wrote:
>
>> On Fri, Nov 22, 2019 at 10:24 AM Robert Bradshaw 
>> wrote:
>>
>>> On Thu, Nov 21, 2019 at 7:05 PM David Cavazos 
>>> wrote:
>>>


 I created this Google Form
 
 if everyone is okay with it to make it easier to both vote and view the
 results :)

>>>
>>> Generally decisions, especially votes, for apache projects are
>>> supposed to happen on-list. I suppose this is more an advisory vote, but
>>> still probably makes sense to keep it here. .
>>>
>>
>> Indeed. Someone suggested a Google form before I started this, but I
>> deliberately didn't use it. It doesn't add much and it puts the vote off
>> list onto opaque and mutable third party infrastructure.
>>
>> If you voted on the form, please repeat it on thread so I can count
>> it.
>>
>> Kenn
>>

Re: Cassandra IO issues and contributing

2019-12-12 Thread Kenneth Knowles
On Thu, Dec 12, 2019 at 3:30 PM Vincent Marquez 
wrote:

> Hello, as I've mentioned in previous emails, I've found the CassandraIO
> connector lacking some essential features for efficient batch processing in
> real world scenarios.  We've developed a more fully featured connector and
> had good results with it.
>

Fantastic!


> Could I perhaps write up a JIRA proposal for some minor changes to the
> current connector that might improve things?
>

Yes!


> The  main pain point is the absense of a 'readAll' method as I documented
> here:
>
> https://gist.github.com/vmarquez/204b8f44b1279fdbae97b40f8681bc25
>
> If I could write up a ticket, I don't mind submitting a small PR on GH as
> well addressing this lack of functionality.  Thanks for your time.
>

This would be excellent. Since it seems you already have implemented and
tested the functionality, a simple Jira with a title and description would
be enough, and then open a PR linked to the Jira with a title like
"[BEAM-1234567] Improve performance of CassandraIO"

Thank you for writing to dev@ to share your experience and intentions. We
are happy to help you with the Jira and PR, and find the best reviewers, if
you will open them to get started.

Kenn



> *-Vincent*
>


Re: Unifying Build/contributing instructions

2019-12-12 Thread Kenneth Knowles
Thanks for taking this on! My preference would be to have CONTRIBUTING.md
link to https://beam.apache.org/contribute/contribution-guide/ and focus
work on the latter.

Kenn

On Thu, Dec 12, 2019 at 12:38 PM Elliotte Rusty Harold 
wrote:

> I've started work on updating and combine the four (or omre?)
> different pages where build instructions are found. The initial PR is
> here:
>
> https://github.com/apache/beam/pull/10366
>
> To put a stake in the ground, this PR chooses CONTRIBUTING.md as the
> ultimate source of truth. A possible alternative is to unify around
> https://beam.apache.org/contribute/contribution-guide/
>
> I'm not wedded to one or the other, but I do think we should pick one
> and stick with it. If the community prefers to focus on
> https://beam.apache.org/contribute/contribution-guide/ we can use that
> instead.
>
> I've added some additional prerequisites to the instructions that were
> not yet included. I don't have it all yet though. Any further
> additions would be much appreciated.
>
> Please leave comments on the PR.
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>


Re: [RELEASE] Tracking 2.18

2019-12-12 Thread Udi Meiri
Also marked 3 Jiras from these cherrypicks as blockers .
Current open blocker count: 7
.

On Thu, Dec 12, 2019 at 5:21 PM Udi Meiri  wrote:

> Just merged 6 PRs. :)
>
> On Thu, Dec 12, 2019 at 4:52 PM Udi Meiri  wrote:
>
>> Update: I'm accepting cherrypicks with failing tests if the corresponding
>> PR have passed them on master.
>>
>> I recall (without proof) that in the past, even with released worker
>> containers for the in-process release, that ITs against the release branch
>> still fail.
>>
>> On Tue, Dec 10, 2019 at 10:58 AM Udi Meiri  wrote:
>>
>>> Re: cherrypicks on top of the release-2.18.0 branch
>>> The precommit tests are failing most likely due to some integration
>>> tests (wordcount, etc.) that are expecting the new 2.18 worker on Dataflow.
>>> I'm working on building an initial version of that worker so that the
>>> tests may pass.
>>>
>>> On Thu, Dec 5, 2019 at 4:39 PM Robert Bradshaw 
>>> wrote:
>>>
 Yeah, so I saw...

 On Thu, Dec 5, 2019 at 4:31 PM Udi Meiri  wrote:
 >
 > Sorry Robert the release was already cut yesterday.
 >
 >
 >
 > On Thu, Dec 5, 2019 at 8:37 AM Ismaël Mejía 
 wrote:
 >>
 >> Colm, I just merged your PR and cherry picked it into 2.18.0
 >> https://github.com/apache/beam/pull/10296
 >>
 >> On Thu, Dec 5, 2019 at 10:54 AM jincheng sun <
 sunjincheng...@gmail.com> wrote:
 >>>
 >>> Thanks for the Tracking Udi!
 >>>
 >>> I have updated the status of some release blockers issues as
 follows:
 >>>
 >>> - BEAM-8733 closed
 >>> - BEAM-8620 reset the fix version to 2.19
 >>> - BEAM-8618 reset the fix version to 2.19
 >>>
 >>> Best,
 >>> Jincheng
 >>>
 >>> Colm O hEigeartaigh  于2019年12月5日周四 下午5:38写道:
 
  Could we get this one in 2.18 as well?
 https://issues.apache.org/jira/browse/BEAM-8861
 
  Colm.
 
  On Wed, Dec 4, 2019 at 8:02 PM Udi Meiri  wrote:
 >
 > Following the release calendar, I plan on cutting the 2.18
 release branch today.
 >
 > There are currently 8 release blockers.
 >

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RELEASE] Tracking 2.18

2019-12-12 Thread Udi Meiri
Just merged 6 PRs. :)

On Thu, Dec 12, 2019 at 4:52 PM Udi Meiri  wrote:

> Update: I'm accepting cherrypicks with failing tests if the corresponding
> PR have passed them on master.
>
> I recall (without proof) that in the past, even with released worker
> containers for the in-process release, that ITs against the release branch
> still fail.
>
> On Tue, Dec 10, 2019 at 10:58 AM Udi Meiri  wrote:
>
>> Re: cherrypicks on top of the release-2.18.0 branch
>> The precommit tests are failing most likely due to some integration tests
>> (wordcount, etc.) that are expecting the new 2.18 worker on Dataflow.
>> I'm working on building an initial version of that worker so that the
>> tests may pass.
>>
>> On Thu, Dec 5, 2019 at 4:39 PM Robert Bradshaw 
>> wrote:
>>
>>> Yeah, so I saw...
>>>
>>> On Thu, Dec 5, 2019 at 4:31 PM Udi Meiri  wrote:
>>> >
>>> > Sorry Robert the release was already cut yesterday.
>>> >
>>> >
>>> >
>>> > On Thu, Dec 5, 2019 at 8:37 AM Ismaël Mejía  wrote:
>>> >>
>>> >> Colm, I just merged your PR and cherry picked it into 2.18.0
>>> >> https://github.com/apache/beam/pull/10296
>>> >>
>>> >> On Thu, Dec 5, 2019 at 10:54 AM jincheng sun <
>>> sunjincheng...@gmail.com> wrote:
>>> >>>
>>> >>> Thanks for the Tracking Udi!
>>> >>>
>>> >>> I have updated the status of some release blockers issues as follows:
>>> >>>
>>> >>> - BEAM-8733 closed
>>> >>> - BEAM-8620 reset the fix version to 2.19
>>> >>> - BEAM-8618 reset the fix version to 2.19
>>> >>>
>>> >>> Best,
>>> >>> Jincheng
>>> >>>
>>> >>> Colm O hEigeartaigh  于2019年12月5日周四 下午5:38写道:
>>> 
>>>  Could we get this one in 2.18 as well?
>>> https://issues.apache.org/jira/browse/BEAM-8861
>>> 
>>>  Colm.
>>> 
>>>  On Wed, Dec 4, 2019 at 8:02 PM Udi Meiri  wrote:
>>> >
>>> > Following the release calendar, I plan on cutting the 2.18 release
>>> branch today.
>>> >
>>> > There are currently 8 release blockers.
>>> >
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Artifact staging in cross-language pipelines

2019-12-12 Thread Heejong Lee
I'm brushing up memory by revisiting the doc[1] and it seems like we've
already reached the consensus on the bigger picture. I would start drafting
the implementation plan.

[1]:
https://docs.google.com/document/d/1XaiNekAY2sptuQRIXpjGAyaYdSc-wlJ-VKjl04c8N48/edit?usp=sharing

On Tue, Nov 26, 2019 at 3:54 AM Maximilian Michels  wrote:

> Hey Heejong,
>
> I don't think so. It would be great to push this forward.
>
> Thanks,
> Max
>
> On 26.11.19 02:49, Heejong Lee wrote:
> > Hi,
> >
> > Is anyone actively working on artifact staging extension for
> > cross-language pipelines? I'm thinking I can contribute to it in coming
> > Dec. If anyone has any progress on this and needs help, please let me
> know.
> >
> > Thanks,
> >
> > On Wed, Jun 12, 2019 at 2:42 AM Ismaël Mejía  > > wrote:
> >
> > Can you please add this to the design documents webpage.
> > https://beam.apache.org/contribute/design-documents/
> >
> > On Wed, May 8, 2019 at 7:29 PM Chamikara Jayalath
> > mailto:chamik...@google.com>> wrote:
> >  >
> >  >
> >  >
> >  > On Tue, May 7, 2019 at 10:21 AM Maximilian Michels
> > mailto:m...@apache.org>> wrote:
> >  >>
> >  >> Here's the first draft:
> >  >>
> >
> https://docs.google.com/document/d/1XaiNekAY2sptuQRIXpjGAyaYdSc-wlJ-VKjl04c8N48/edit?usp=sharing
> >  >>
> >  >> It's rather high-level. We may want to add more details once we
> have
> >  >> finalized the design. Feel free to make comments and edits.
> >  >
> >  >
> >  > Thanks Max. Added some comments.
> >  >
> >  >>
> >  >>
> >  >> > All of this goes back to the idea that I think the listing of
> >  >> > artifacts (or more general dependencies) should be a property
> > of the
> >  >> > environment themselves.
> >  >>
> >  >> +1 I came to the same conclusion while thinking about how to
> store
> >  >> artifact information for deferred execution of the pipeline.
> >  >>
> >  >> -Max
> >  >>
> >  >> On 07.05.19 18:10, Robert Bradshaw wrote:
> >  >> > Looking forward to your writeup, Max. In the meantime, some
> > comments below.
> >  >> >
> >  >> >
> >  >> > From: Lukasz Cwik mailto:lc...@google.com>>
> >  >> > Date: Thu, May 2, 2019 at 6:45 PM
> >  >> > To: dev
> >  >> >
> >  >> >>
> >  >> >>
> >  >> >> On Thu, May 2, 2019 at 7:20 AM Robert Bradshaw
> > mailto:rober...@google.com>> wrote:
> >  >> >>>
> >  >> >>> On Sat, Apr 27, 2019 at 1:14 AM Lukasz Cwik
> > mailto:lc...@google.com>> wrote:
> >  >> 
> >  >>  We should stick with URN + payload + artifact metadata[1]
> > where the only mandatory one that all SDKs and expansion services
> > understand is the "bytes" artifact type. This allows us to add
> > optional URNs for file://, http://, Maven, PyPi, ... in the future.
> > I would make the artifact staging service use the same URN + payload
> > mechanism to get compatibility of artifacts across the different
> > services and also have the artifact staging service be able to be
> > queried for the list of artifact types it supports.
> >  >> >>>
> >  >> >>> +1
> >  >> >>>
> >  >>  Finally, we would need to have environments enumerate the
> > artifact types that they support.
> >  >> >>>
> >  >> >>> Meaning at runtime, or as another field statically set in
> > the proto?
> >  >> >>
> >  >> >>
> >  >> >> I don't believe runners/SDKs should have to know what
> > artifacts each environment supports at runtime and instead have
> > environments enumerate them explicitly in the proto. I have been
> > thinking about a more general "capabilities" block on environments
> > which allow them to enumerate URNs that the environment understands.
> > This would include artifact type URNs, PTransform URNs, coder URNs,
> > ... I haven't proposed anything specific down this line yet because
> > I was wondering how environment resources (CPU, min memory, hardware
> > like GPU, AWS/GCP/Azure/... machine types) should/could tie into
> this.
> >  >> >>
> >  >> >>>
> >  >>  Having everyone have the same "artifact" representation
> > would be beneficial since:
> >  >>  a) Python environments could install dependencies from a
> > requirements.txt file (something that the Google Cloud Dataflow
> > Python docker container allows for today)
> >  >>  b) It provides an extensible and versioned mechanism for
> > SDKs, environments, and artifact staging/retrieval services to
> > support additional artifact types
> >  >>  c) Allow for expressing a canonical representation of an
> > artifact like a Maven package so a runner could merge environments
> > that the runner deems compatible.
> >  >> 
> >  >>  The flow I could see is:
> >  >>  1) (optional) query 

Re: [RELEASE] Tracking 2.18

2019-12-12 Thread Udi Meiri
Update: I'm accepting cherrypicks with failing tests if the corresponding
PR have passed them on master.

I recall (without proof) that in the past, even with released worker
containers for the in-process release, that ITs against the release branch
still fail.

On Tue, Dec 10, 2019 at 10:58 AM Udi Meiri  wrote:

> Re: cherrypicks on top of the release-2.18.0 branch
> The precommit tests are failing most likely due to some integration tests
> (wordcount, etc.) that are expecting the new 2.18 worker on Dataflow.
> I'm working on building an initial version of that worker so that the
> tests may pass.
>
> On Thu, Dec 5, 2019 at 4:39 PM Robert Bradshaw 
> wrote:
>
>> Yeah, so I saw...
>>
>> On Thu, Dec 5, 2019 at 4:31 PM Udi Meiri  wrote:
>> >
>> > Sorry Robert the release was already cut yesterday.
>> >
>> >
>> >
>> > On Thu, Dec 5, 2019 at 8:37 AM Ismaël Mejía  wrote:
>> >>
>> >> Colm, I just merged your PR and cherry picked it into 2.18.0
>> >> https://github.com/apache/beam/pull/10296
>> >>
>> >> On Thu, Dec 5, 2019 at 10:54 AM jincheng sun 
>> wrote:
>> >>>
>> >>> Thanks for the Tracking Udi!
>> >>>
>> >>> I have updated the status of some release blockers issues as follows:
>> >>>
>> >>> - BEAM-8733 closed
>> >>> - BEAM-8620 reset the fix version to 2.19
>> >>> - BEAM-8618 reset the fix version to 2.19
>> >>>
>> >>> Best,
>> >>> Jincheng
>> >>>
>> >>> Colm O hEigeartaigh  于2019年12月5日周四 下午5:38写道:
>> 
>>  Could we get this one in 2.18 as well?
>> https://issues.apache.org/jira/browse/BEAM-8861
>> 
>>  Colm.
>> 
>>  On Wed, Dec 4, 2019 at 8:02 PM Udi Meiri  wrote:
>> >
>> > Following the release calendar, I plan on cutting the 2.18 release
>> branch today.
>> >
>> > There are currently 8 release blockers.
>> >
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Root logger configuration

2019-12-12 Thread Pablo Estrada
Hello all,
It has been pointed out to me by Chad, and also by others, that my logging
changes have caused logs to start getting lost.

It seems that by never logging on the root logger, initialization for a
root handler is skipped; and that's what causes the failures.

I will work on a fix for this. I am thinking of providing a very simple
apache_beam.utils.get_logger function that does something like this:

def get_logger(name):
  logging.basicConfig()
  return logging.getLogger(name)

And specific paths that need special handling of the logs should override
this config by adding their own handlers (e.g. sdk_worker, fn_api_runner,
etc).

I hope I can have a fix for this by tomorrow.
Best
-P.


Re: Committed without review. Sorry.

2019-12-12 Thread Valentyn Tymofieiev
The change LGTM, so you can consider it reviewed.  In general it would be
nice to set up alerts to catch these situations, to make sure they don't go
unnoticed.

Also as a reminder - please don't commit or merge PRs into release branches
without a review from a release manager.

On Thu, Dec 12, 2019 at 3:44 PM Pablo Estrada  wrote:

> Seed job runs okay:
> https://builds.apache.org/job/beam_SeedJob_Standalone/3865/console
>
>
> On Thu, Dec 12, 2019 at 3:28 PM Pablo Estrada  wrote:
>
>> I accidentally committed a small change to master:
>> https://github.com/apache/beam/commit/6018326ffe74aac7d8c44ded296b92f8b5c0b556
>>
>> I am verifying that this works as intended for now.
>>
>> What should we do about this? Revert? Leave as is if it works fine?
>> Best
>> -P
>>
>


Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-12-12 Thread Aizhamal Nurmamat kyzy
Thank you, Kenn for running the vote. I reached out to a couple community
members to see if they would like to develop the design and contribute it
to Beam. I will keep you all posted. Thanks :)

On Mon, Dec 2, 2019 at 8:20 PM Kenneth Knowles  wrote:

> Hi all,
>
> I have tweaked Robert's python* and then applied three filters: All
> voters, committers, and PMC.
>
> Summary:
>
>  - All voters (46): Firefly (but Owl close behind, no others close)
>  - Committers (24): Owl (but Firefly close behind, no others close)
>  - PMC (6): Cuttlefish (but a many-way tie close behind)
>
> It seems most of the PMC has decided to leave this decision to the broader
> community. So in the spirit of community over code, and respecting the
> exact outcome of the vote regardless of who showed up to actually vote, I
> will open a final vote for Firefly.
>
> Kenn
>
> 
>
> All Voters:
>
> Firefly 21
> Owl 19
> Dumbo Octopus 11
> Lemur 15
> Salmon 5
> Angler fish 12
> Robot dinosaur 10
> Capybara 2
> Beaver 2
> Trout 3
> Cuttlefish 12
> Honey Badger 1
> Hedgehog 11
>
> 
>
> Committers:
>
> Firefly 13
> Owl 14
> Dumbo Octopus 7
> Lemur 8
> Salmon 1
> Angler fish 4
> Robot dinosaur 4
> Capybara 0
> Beaver 1
> Trout 1
> Cuttlefish 8
> Honey Badger 1
> Hedgehog 5
>
> 
>
> PMC:
>
> Firefly 2
> Owl 2
> Dumbo Octopus 0
> Lemur 2
> Salmon 0
> Angler fish 2
> Robot dinosaur 0
> Capybara 0
> Beaver 0
> Trout 0
> Cuttlefish 3
> Honey Badger 0
> Hedgehog 0
>
> 
>
> *import collections, pprint, re, requests, csv, sys
> thread = requests.get('
> https://lists.apache.org/api/thread.lua?id=ff60eabbf8349ba6951633869000356c2c2feb48bbff187cf3c60039@%3Cdev.beam.apache.org%3E').json(
> )
> counts = collections.defaultdict(lambda: collections.defaultdict(int))
>
> for email in thread['emails']:
>   author = email['from']
>   body = requests.get('https://lists.apache.org/api/email.lua?id=%s' %
> email['mid']).json()['body']
>   for vote in re.findall(r'\n\s*\[\s*[xX]\s*\]\s*([a-zA-Z ]+)', body):
>counts[author][vote] = 1
>   pprint.pprint(sorted(counts[author].items(), key=lambda kv: kv[-1]))
>
> candidates = set().union(*[counts[author].keys() for author in
> counts.keys()])
>
> votewriter = csv.DictWriter(sys.stdout, ["author"] + list(candidates))
> votewriter.writeheader()
> for author, votecount in counts.items():
>   votewriter.writerow(dict({"author": author.encode('utf-8')},
> **votecount))
>
> On Mon, Nov 25, 2019 at 2:09 PM Mark Liu  wrote:
>
>> [ ] Beaver
>> [ ] Hedgehog
>> [ ] Lemur
>> [ ] Owl
>> [ ] Salmon
>> [ ] Trout
>> [ ] Robot dinosaur
>> [ ] Firefly
>> [ ] Cuttlefish
>> [X] Dumbo Octopus
>> [ ] Angler fish
>>
>> On Mon, Nov 25, 2019 at 1:22 PM David Cavazos 
>> wrote:
>>
>>> Hi Kenneth, I tried adding back the email addresses, but they weren't
>>> added on the existing responses, it would only add them on new ones. :(
>>>
>>> I've already made it not accept new responses.
>>>
>>> There are only 8 responses (2 mine, 1 my real vote and 1 empty test
>>> vote), so hopefully everyone who voted there can vote back here.
>>>
>>> On Sat, Nov 23, 2019 at 7:27 PM Kenneth Knowles  wrote:
>>>
 David - if you can reconfigure the form so it is not anonymous (at
 least to me) then I may be up for including those results in the tally. I
 don't want to penalize those who voted via the form. But since there are
 now two voting channels we have to dedupe or discard the form results. And
 I need to be able to see which votes are PMC. Even if advisory, it does
 need to move to a concluding vote, and PMC votes could be a tiebreaker
 of sorts.

 Kenn

 On Sat, Nov 23, 2019 at 7:17 PM Kenneth Knowles 
 wrote:

> On Fri, Nov 22, 2019 at 10:24 AM Robert Bradshaw 
> wrote:
>
>> On Thu, Nov 21, 2019 at 7:05 PM David Cavazos 
>> wrote:
>>
>>>
>>>
>>> I created this Google Form
>>> 
>>> if everyone is okay with it to make it easier to both vote and view the
>>> results :)
>>>
>>
>> Generally decisions, especially votes, for apache projects are
>> supposed to happen on-list. I suppose this is more an advisory vote, but
>> still probably makes sense to keep it here. .
>>
>
> Indeed. Someone suggested a Google form before I started this, but I
> deliberately didn't use it. It doesn't add much and it puts the vote off
> list onto opaque and mutable third party infrastructure.
>
> If you voted on the form, please repeat it on thread so I can count it.
>
> Kenn
>
>
>
> import collections, pprint, re, requests
>> thread = requests.get('
>> https://lists.apache.org/api/thread.lua?id=ff60eabbf8349ba6951633869000356c2c2feb48bbff187cf3c60039@%3Cdev.beam.apache.org%3E').json(
>> )
>> counts = collections.defaultdict(int)
>> for email in thread['emails']:

Re: Committed without review. Sorry.

2019-12-12 Thread Pablo Estrada
Seed job runs okay:
https://builds.apache.org/job/beam_SeedJob_Standalone/3865/console


On Thu, Dec 12, 2019 at 3:28 PM Pablo Estrada  wrote:

> I accidentally committed a small change to master:
> https://github.com/apache/beam/commit/6018326ffe74aac7d8c44ded296b92f8b5c0b556
>
> I am verifying that this works as intended for now.
>
> What should we do about this? Revert? Leave as is if it works fine?
> Best
> -P
>


Re: Beam's job crashes on cluster

2019-12-12 Thread Kyle Weaver
Can you share the pipeline options you are using?
Particularly environment_type and environment_config.

On Thu, Dec 12, 2019 at 2:58 PM Matthew K.  wrote:

> Running Beam on Spark cluster, it crashhes and I get the following error
> (workers are on separate nodes, it works fine when workers are on the same
> node as runner):
>
> > Task :runners:spark:job-server:runShadow FAILED
> Exception in thread wait_until_finish_read:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
> self.run()
>   File "/usr/lib/python2.7/threading.py", line 754, in run
> self.__target(*self.__args, **self.__kwargs)
>   File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/portable_runner.py",
> line 411, in read_messages
> for message in self._message_stream:
>   File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line
> 395, in next
> return self._next()
>   File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line
> 561, in _next
> raise self
> _Rendezvous: <_Rendezvous of RPC that terminated with:
> status = StatusCode.UNAVAILABLE
> details = "Socket closed"
> debug_error_string =
> "{"created":"@1576190515.361076583","description":"Error received from peer
> ipv4:127.0.0.1:8099","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket
> closed","grpc_status":14}"
> >
> Traceback (most recent call last):
>   File "/opt/spark/work-dir/beam_script.py", line 49, in 
> stats = tfdv.generate_statistics_from_csv(data_location=DATA_LOCATION,
> pipeline_options=options)
>   File
> "/usr/local/lib/python2.7/dist-packages/tensorflow_data_validation/utils/stats_gen_lib.py",
> line 197, in generate_statistics_from_csv
> statistics_pb2.DatasetFeatureStatisticsList)))
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py",
> line 427, in __exit__
> self.run().wait_until_finish()
>   File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/portable_runner.py",
> line 429, in wait_until_finish
> for state_response in self._state_stream:
>   File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line
> 395, in next
> return self._next()
>   File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line
> 561, in _next
> raise self
> grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
> status = StatusCode.UNAVAILABLE
> details = "Socket closed"
> debug_error_string =
> "{"created":"@1576190515.361053677","description":"Error received from peer
> ipv4:127.0.0.1:8099","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket
> closed","grpc_status":14}"
>


Cassandra IO issues and contributing

2019-12-12 Thread Vincent Marquez
Hello, as I've mentioned in previous emails, I've found the CassandraIO
connector lacking some essential features for efficient batch processing in
real world scenarios.  We've developed a more fully featured connector and
had good results with it.

Could I perhaps write up a JIRA proposal for some minor changes to the
current connector that might improve things?  The  main pain point is the
absense of a 'readAll' method as I documented here:

https://gist.github.com/vmarquez/204b8f44b1279fdbae97b40f8681bc25

If I could write up a ticket, I don't mind submitting a small PR on GH as
well addressing this lack of functionality.  Thanks for your time.

*-Vincent*


Committed without review. Sorry.

2019-12-12 Thread Pablo Estrada
I accidentally committed a small change to master:
https://github.com/apache/beam/commit/6018326ffe74aac7d8c44ded296b92f8b5c0b556

I am verifying that this works as intended for now.

What should we do about this? Revert? Leave as is if it works fine?
Best
-P


Beam's job crashes on cluster

2019-12-12 Thread Matthew K.
Running Beam on Spark cluster, it crashhes and I get the following error (workers are on separate nodes, it works fine when workers are on the same node as runner):

 


> Task :runners:spark:job-server:runShadow FAILED
Exception in thread wait_until_finish_read:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/portable_runner.py", line 411, in read_messages
    for message in self._message_stream:
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 395, in next
    return self._next()
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 561, in _next
    raise self
_Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "Socket closed"
    debug_error_string = "{"created":"@1576190515.361076583","description":"Error received from peer ipv4:127.0.0.1:8099","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14}"
>

Traceback (most recent call last):
  File "/opt/spark/work-dir/beam_script.py", line 49, in 
    stats = tfdv.generate_statistics_from_csv(data_location=DATA_LOCATION, pipeline_options=options)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_data_validation/utils/stats_gen_lib.py", line 197, in generate_statistics_from_csv
    statistics_pb2.DatasetFeatureStatisticsList)))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 427, in __exit__
    self.run().wait_until_finish()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/portability/portable_runner.py", line 429, in wait_until_finish
    for state_response in self._state_stream:
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 395, in next
    return self._next()
  File "/usr/local/lib/python2.7/dist-packages/grpc/_channel.py", line 561, in _next
    raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "Socket closed"
    debug_error_string = "{"created":"@1576190515.361053677","description":"Error received from peer ipv4:127.0.0.1:8099","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14}"



Unifying Build/contributing instructions

2019-12-12 Thread Elliotte Rusty Harold
I've started work on updating and combine the four (or omre?)
different pages where build instructions are found. The initial PR is
here:

https://github.com/apache/beam/pull/10366

To put a stake in the ground, this PR chooses CONTRIBUTING.md as the
ultimate source of truth. A possible alternative is to unify around
https://beam.apache.org/contribute/contribution-guide/

I'm not wedded to one or the other, but I do think we should pick one
and stick with it. If the community prefers to focus on
https://beam.apache.org/contribute/contribution-guide/ we can use that
instead.

I've added some additional prerequisites to the instructions that were
not yet included. I don't have it all yet though. Any further
additions would be much appreciated.

Please leave comments on the PR.

-- 
Elliotte Rusty Harold
elh...@ibiblio.org


Re: RFC: python static typing PR

2019-12-12 Thread Chad Dombrova
Thanks for the kind words, everyone.

Note that the PR that was merged was the first half, so the mypy-lint job
is not yet setup to trigger failures.

Part 2 is up now:  https://github.com/apache/beam/pull/10367

My goal is to bundle up changes into smaller PRs from here on out.  It
might take another 3 PRs to get through the rest.

-chad


On Wed, Dec 11, 2019 at 2:13 PM Ahmet Altay  wrote:

> Thank you, Chad! This is awesome.
>
> On Wed, Dec 11, 2019 at 2:05 PM Robert Bradshaw 
> wrote:
>
>> +1. Thanks!
>>
>> On Wed, Dec 11, 2019 at 1:44 PM Pablo Estrada  wrote:
>> >
>> > Love it. I've merged it, since it was already approved by Robert - and
>> yes, we don't want to hit a merge conflict.
>> >
>> > On Wed, Dec 11, 2019 at 1:36 PM Heejong Lee  wrote:
>> >>
>> >> Wow, this is a big step forward. As a long fan of strongly typed
>> functional languages, I'm glad to see this change :)
>> >>
>> >> On Wed, Dec 11, 2019 at 9:44 AM Chad Dombrova 
>> wrote:
>> >>>
>> >>> Hi all,
>> >>> Robert has diligently reviewed the first batch of changes for this
>> PR, and all review notes are addressed and tests are passing:
>> https://github.com/apache/beam/pull/9915
>> >>>
>> >>> Due to the number of file touched there's a short window of about one
>> or two days before a merge conflict arrives on master, and after resolving
>> that it usually takes another 1-2 days of pasting "Run Python PreCommit"
>> until they pass again, so it would be great to get this merged while the
>> window is open!  Despite the number of files touched, the changes are
>> almost entirely type comments, so the PR is designed to be quite safe.
>> >>>
>> >>> -chad
>> >>>
>> >>>
>> >>> On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova 
>> wrote:
>> 
>>  Glad to hear we have such a forward-thinking community!
>> 
>> 
>>  On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw 
>> wrote:
>> >
>> > Sounds like we have consensus. Let's move forward. I'll follow up
>> with
>> > the discussions on the PRs themselves.
>> >
>> > On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>> > >
>> > > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova 
>> wrote:
>> > > >
>> > > >> Do you believe that a future mypy plugin could replace
>> pipeline type checks in Beam, or are there limits to what it can do?
>> > > >
>> > > > mypy will get us quite far on its own once we completely
>> annotate the beam code.  That said, my PR does not include my efforts to
>> turn PTransforms into Generics, which will be required to properly analyze
>> pipelines, so there's still a lot more work to do.  I've experimented with
>> a mypy plugin to smooth over some of the rough spots in that workflow and I
>> will just say that the mypy API has a very steep learning curve.
>> > > >
>> > > > Another thing to note: mypy is very explicit about function
>> annotations.  It does not do the "implicit" inference that Beam does, such
>> as automatically detecting function return types.  I think it should be
>> possible to do a lot of that as a mypy plugin, and in fact, since it has
>> little to do with Beam it could grow into its own project with outside
>> contributors.
>> > >
>> > > Yeah, I don't think, as is, it can replace what we do, but with
>> > > plugins I think it could possibly come closer. Certainly there is
>> > > information that is only available at runtime (e.g. reading from a
>> > > database or avro/parquet file could provide the schema which can
>> be
>> > > used for downstream checking) which may limit the ability to do
>> > > everything statically (even Beam Java is moving this direction).
>> Mypy
>> > > clearly has an implementation of the "is compatible with" operator
>> > > that I would love to borrow, but unfortunately it's not (easily?)
>> > > exposed.
>> > >
>> > > That being said, we should leverage what we can for pipeline
>> > > authoring, and it'll be a great development too regardless.
>>
>


Re: Executing the runner validation tests for the Twister2 runner

2019-12-12 Thread Pulasthi Supun Wickramasinghe
Hi Kenn

We are still working on aspects like automated job monitoring so currently
do not have those capabilities built-in. I discussed with the Twister2 team
on a way we can forward failure information from the workers to the
Jobmaster which would be a solution to this problem. It might take a little
time to develop and test. I will update you after looking into that
solution in a little more detail.

Best Regards,
Pulasthi

On Wed, Dec 11, 2019 at 10:51 PM Kenneth Knowles  wrote:

> I dug in to Twister2 a little bit to understand the question better,
> checking how the various resource managers / launchers are plumbed.
>
> How would a user set up automated monitoring for a job? If that is
> scraping the logs, then it seems unfortunate for users, but I think the
> Beam runner would naturally use whatever a user might use.
>
> Kenn
>
> On Wed, Dec 11, 2019 at 10:45 AM Pulasthi Supun Wickramasinghe <
> pulasthi...@gmail.com> wrote:
>
>> Hi Dev's
>>
>> I have been making some progress on the Twister2 runner for the beam that
>> I mentioned before on the mailing list. The runner is able to run the
>> wordcount example and produce correct results. So I am currently trying to
>> run the runner validation tests.
>>
>> From what I understood looking at a couple examples is that tests are
>> validated based on the exceptions that are thrown (or not) during test
>> runtime.  However in Twister2 currently the job submission client does not
>> get failure information such as exceptions back once the job is submitted.
>> These are however recorded in the worker log files.
>>
>> So in order to validate the tests for Twister2 I would have to parse the
>> worker logfile and check what exceptions are in the logs. Would that be an
>> acceptable solution for the validation tests?
>>
>> Best Regards,
>> Pulasthi
>>
>>
>>
>>
>> --
>> Pulasthi S. Wickramasinghe
>> PhD Candidate  | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> cell: 224-386-9035 <(224)%20386-9035>
>>
>

-- 
Pulasthi S. Wickramasinghe
PhD Candidate  | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
cell: 224-386-9035


Re: December board report

2019-12-12 Thread Alexey Romanenko
Kenn,

I’d suggest to mention also a discussion about re-imagination of 
“@Experimental” annotation and stabilisation API in this sense. 

> On 12 Dec 2019, at 06:07, Kenneth Knowles  wrote:
> 
> Hi all,
> 
> Late notice on this, but the December Board report is due "now".
> 
> I've started a draft here: 
> https://docs.google.com/document/d/1AJT5j-qRLJPeN5x6nbHD5KqadXLM0zT0Ugmiy_vQ7C8/edit?usp=sharing
>  
> 
> 
> I included a number of big discussions or features that I noticed in the 
> email lists or GitHub history. Please help me by describing them more fully. 
> You can see the style of past board reports at 
> https://whimsy.apache.org/board/minutes/Beam.html 
> 
> 
> I will submit by the end of the week.
> 
> Kenn