Build failed in Jenkins: beam_Release_NightlySnapshot #729

2018-03-30 Thread Apache Jenkins Server
See 


Changes:

[altay] Update streaming wordcount example and allign with the batch example.

[github] Fix linter error in typehints.

[wcn] Remove include directives for proto well-known-types.

--
[...truncated 2.82 MB...]
2018-03-30T10:56:58.693 [INFO] Excluding 
org.glassfish.jersey.core:jersey-server:jar:2.22.2 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2 from the 
shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2 from 
the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding io.netty:netty-all:jar:4.0.43.Final 
from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding io.netty:netty:jar:3.9.9.Final from 
the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
io.dropwizard.metrics:metrics-jvm:jar:3.1.2 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
io.dropwizard.metrics:metrics-json:jar:3.1.2 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
io.dropwizard.metrics:metrics-graphite:jar:3.1.2 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding org.apache.ivy:ivy:jar:2.4.0 from the 
shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding net.razorvine:pyrolite:jar:4.13 from 
the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding net.sf.py4j:py4j:jar:0.10.4 from the 
shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.apache.spark:spark-tags_2.11:jar:2.2.1 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.apache.commons:commons-crypto:jar:1.0.0 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.spark-project.spark:unused:jar:1.0.0 from the shaded jar.
2018-03-30T10:56:58.693 [INFO] Excluding 
org.apache.spark:spark-streaming_2.11:jar:2.2.1 from the shaded jar.
2018-03-30T10:57:01.501 [INFO] Replacing original artifact with shaded artifact.
2018-03-30T10:57:01.610 [INFO] 
2018-03-30T10:57:01.610 [INFO] --- maven-assembly-plugin:3.1.0:single 
(source-release-assembly) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:01.615 [INFO] Skipping the assembly in this project because 
it's not the Execution Root
2018-03-30T10:57:01.724 [INFO] 
2018-03-30T10:57:01.725 [INFO] --- maven-source-plugin:3.0.1:jar-no-fork 
(attach-sources) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:01.833 [INFO] 
2018-03-30T10:57:01.833 [INFO] --- maven-source-plugin:3.0.1:test-jar-no-fork 
(attach-test-sources) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:01.941 [INFO] 
2018-03-30T10:57:01.941 [INFO] --- maven-javadoc-plugin:3.0.0-M1:jar 
(attach-javadocs) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:01.944 [INFO] Not executing Javadoc as the project is not a 
Java classpath-capable package
2018-03-30T10:57:02.059 [INFO] 
2018-03-30T10:57:02.059 [INFO] --- 
reproducible-build-maven-plugin:0.4:strip-jar (default) @ 
beam-sdks-java-javadoc ---
2018-03-30T10:57:02.059 [INFO] Stripping 

2018-03-30T10:57:02.235 [INFO] 
2018-03-30T10:57:02.235 [INFO] --- maven-dependency-plugin:3.0.2:analyze-only 
(default) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:02.237 [INFO] Skipping plugin execution
2018-03-30T10:57:02.352 [INFO] 
2018-03-30T10:57:02.352 [INFO] --- maven-install-plugin:2.5.2:install 
(default-install) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:02.353 [INFO] Installing 

 to 

2018-03-30T10:57:02.434 [INFO] Installing 

 to 

2018-03-30T10:57:02.628 [INFO] 
2018-03-30T10:57:02.628 [INFO] --- maven-deploy-plugin:2.8.2:deploy 
(default-deploy) @ beam-sdks-java-javadoc ---
2018-03-30T10:57:02.631 [INFO] Downloading from apache.snapshots.https: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-javadoc/2.5.0-SNAPSHOT/maven-metadata.xml
2018-03-30T10:57:02.863 [INFO] Downloaded from apache.snapshots.https: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-javadoc/2.5.0-SNAPSHOT/maven-metadata.xml
 (790 B at 3.4 kB/s)
2018-03-30T10:57:02.865 [INFO] 

Re: Gradle migration fixit: April 3

2018-03-30 Thread Romain Manni-Bucau
Yep - sorry if it was unclear. I know linux distro often do it (never
understood why though).


Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book


2018-03-30 7:54 GMT+02:00 Reuven Lax :

>
>
> On Thu, Mar 29, 2018 at 10:28 PM Romain Manni-Bucau 
> wrote:
>
>> It was more about the drop of poms (same case as you for dataflow).
>>
>
> Ah - you're worried that some external users are building directly from
> the poms rather than using the published artifact.
>
> I think this is a valid concern, and I agree we should announce on users@
> before (probably some time before) deleting the poms  .
>
>
>> On that there is a missing but highly important task: gradle to mvn
>> descriptors. All the one I saw were corrupted poms so we must take care of
>> that as part of the release work (I can work on it on the 3rd if you want).
>>
>> Le 29 mars 2018 23:36, "Reuven Lax"  a écrit :
>>
>>> I don't mind notifying users@, but this does seem more interesting for
>>> dev@. We will continue to publish Maven artifacts from our Gradle
>>> build, so users are still free to use either Maven or Gradle.
>>>
>>> That being said, if this is interesting to users@ we can notify them as
>>> well.
>>>
>>> Reuven
>>>
>>> On Thu, Mar 29, 2018 at 1:45 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>


 Le 29 mars 2018 21:20, "Reuven Lax"  a écrit :



 On Thu, Mar 29, 2018 at 12:17 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

>
>
> Le 29 mars 2018 20:35, "Reuven Lax"  a écrit :
>
> 1. As Luke already mentioned, we should first have a subgoal of the
> Gradle jenkins jobs being equivalent to the Maven jobs. Hopefully toward
> the end of the day, we'll make this change.
>
> 2. Let's see how much progress we make on the third. There is a side
> problem we have here at Google - we have an internal product called
> Dataflow built on Beam, and the Dataflow build still depends on those pom
> files. I would request leaving the pom files around just a little bit
> longer even if Beam no longer needs them, just so that we don't breakd
> Dataflow (and I think we would do this for any community members with a
> similar issue). We will prioritize moving Dataflow ASAP, it's just that 
> the
> people who will do so will also be in the April 3 fixit so it can't happen
> until after. I think the delay should only be one or two weeks to delete
> the poms (assuming that Beam is ready at the end of the day).
>
>
> Can you try to put a date of that and we will communicate on it
> publicly if anyone else does (i dont think but i didnt expect you to do it
> too ;)).
>

 What do you mean by communicate publicly? The dev list is already cced
 here :)


 Misses a (not too far) date and users@ pby to be safe ;)


>
>
>
> On Thu, Mar 29, 2018 at 5:20 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> Hi Reuven, a few questions:
>>
>> 1. any inputs on how we can work on the jenkins part? Do we test it
>> live wiht "fake" PRs?
>> 2. What's the rational to not start by deleting the poms? Sounds like
>> it will be a day working on gradle and on the 4th we'll be back on maven
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-03-29 4:46 GMT+02:00 Reuven Lax :
>>
>>> Hi all,
>>>
>>> Last week we discussed having a "fixit" day for Gradle, and I
>>> volunteered to organize it. A number of people volunteered to help, from
>>> multiple organization. I'd like to say that it's great to see such a
>>> diverse set of people volunteering to help here - this is a great way to
>>> build community! Everyone who explicitly volunteered is directly cced on
>>> this email, though we'd love for more of the community to help.
>>>
>>> The agreed upon date is April 3. The top-level JIRA tracking this
>>> work is
>>>
>>> ttps://issues.apache.org/jira/browse/BEAM-3249
>>> , and we currently
>>> have 26 subtasks linked to it. I've created a Kanban board to track 
>>> these
>>> 

Re: Gradle migration fixit: April 3

2018-03-30 Thread Reuven Lax
Here is the Kanban board tracking all the current tasks. It looks like 7 of
them have already been closed over the past two days, so we're down to 19!

https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242

On Thu, Mar 29, 2018 at 11:21 PM Romain Manni-Bucau 
wrote:

> Yep - sorry if it was unclear. I know linux distro often do it (never
> understood why though).
>
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
> 2018-03-30 7:54 GMT+02:00 Reuven Lax :
>
>>
>>
>> On Thu, Mar 29, 2018 at 10:28 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> It was more about the drop of poms (same case as you for dataflow).
>>>
>>
>> Ah - you're worried that some external users are building directly from
>> the poms rather than using the published artifact.
>>
>> I think this is a valid concern, and I agree we should announce on users@
>> before (probably some time before) deleting the poms  .
>>
>>
>>> On that there is a missing but highly important task: gradle to mvn
>>> descriptors. All the one I saw were corrupted poms so we must take care of
>>> that as part of the release work (I can work on it on the 3rd if you want).
>>>
>>> Le 29 mars 2018 23:36, "Reuven Lax"  a écrit :
>>>
 I don't mind notifying users@, but this does seem more interesting for
 dev@. We will continue to publish Maven artifacts from our Gradle
 build, so users are still free to use either Maven or Gradle.

 That being said, if this is interesting to users@ we can notify them
 as well.

 Reuven

 On Thu, Mar 29, 2018 at 1:45 PM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

>
>
> Le 29 mars 2018 21:20, "Reuven Lax"  a écrit :
>
>
>
> On Thu, Mar 29, 2018 at 12:17 PM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>>
>>
>> Le 29 mars 2018 20:35, "Reuven Lax"  a écrit :
>>
>> 1. As Luke already mentioned, we should first have a subgoal of the
>> Gradle jenkins jobs being equivalent to the Maven jobs. Hopefully toward
>> the end of the day, we'll make this change.
>>
>> 2. Let's see how much progress we make on the third. There is a side
>> problem we have here at Google - we have an internal product called
>> Dataflow built on Beam, and the Dataflow build still depends on those pom
>> files. I would request leaving the pom files around just a little bit
>> longer even if Beam no longer needs them, just so that we don't breakd
>> Dataflow (and I think we would do this for any community members with a
>> similar issue). We will prioritize moving Dataflow ASAP, it's just that 
>> the
>> people who will do so will also be in the April 3 fixit so it can't 
>> happen
>> until after. I think the delay should only be one or two weeks to delete
>> the poms (assuming that Beam is ready at the end of the day).
>>
>>
>> Can you try to put a date of that and we will communicate on it
>> publicly if anyone else does (i dont think but i didnt expect you to do 
>> it
>> too ;)).
>>
>
> What do you mean by communicate publicly? The dev list is already cced
> here :)
>
>
> Misses a (not too far) date and users@ pby to be safe ;)
>
>
>>
>>
>>
>> On Thu, Mar 29, 2018 at 5:20 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hi Reuven, a few questions:
>>>
>>> 1. any inputs on how we can work on the jenkins part? Do we test it
>>> live wiht "fake" PRs?
>>> 2. What's the rational to not start by deleting the poms? Sounds
>>> like it will be a day working on gradle and on the 4th we'll be back on
>>> maven
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>> 2018-03-29 4:46 GMT+02:00 Reuven Lax :
>>>
 Hi all,

 Last week we discussed having a "fixit" day for Gradle, and I
 volunteered to organize it. A number of people volunteered to help, 
 from
 multiple organization. I'd like to say that it's great to see such a
 diverse set of people volunteering to help here - this is a great way 
 to
 build 

Re: [PROPOSAL] Python 3 support

2018-03-30 Thread Robbe Sneyders
Thanks Ahmet and Robert,

I think we can work on different subpackages in parallel, but it's
important to apply the same strategy everywhere. I'm currently working on
applying step 1 (was mostly done already) and 2 of the proposal to the
coders subpackage to create a first pull request. We can then discuss the
applied strategy in detail before merging and applying it to the other
subpackages.

This strategy also includes the choice of automated tools. I'm focusing on
writing python 3 code with python 2 compatibility, which means depending on
the future package instead of the six package (which is already used in
some places in the current code base). I have already noticed that this
indeed requires a lot of manual work after running the automated script.
The future package supports python 3.3+ compatibility, so I don't think
there is a higher cost supporting 3.4 compared to 3.5+.

I have already added a tox environment to run pylint2 with the --py3k
argument per updated subpackage, which should help avoid regression between
step 2 and step 3 of the proposal. This update will be pushed with the
first pull request.

Kind regards,
Robbe


On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw  wrote:

> Thank you, Robbie, for your offer to help with contribution here. I read
> over your doc and the one thing I'd like to add is that this work is very
> parallelizable, but if we have enough people looking at it we'll want some
> way to coordinate so as to not overlap work (or just waste time discovering
> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
> a spreadsheet with modules/packages on one axis and the various
> automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so we
> should be sure to review the changes manually. (A typical example of this
> is unnecessarily importing six.moves.xrange when there was no big reason to
> use xrange over range in Python 2, or conversely using list(range(...) in
> Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
>>> wrote:
>>>
 Hi Anand,

 Thanks for the feedback.

 It should be no problem to run everything on DataflowRunner as well.
 Are there any performance tests in place to check for performance
 regressions?

>>>
>> Yes there is a suite (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>> It may not be very comprehensive and seems to be failing for a while. I
>> would not block python 3 work on performance for now. That is the
>> unfortuante state of things.
>>
>> If anybody in the community is interested, this would be a great
>> opportunity to help with benchmarks in general.
>>
>>
>>>
 Some questions were raised in the proposal document which I want to add
 to this conversation:

 The first comment was about the targeted python 3 versions. We proposed
 to target 3.6 since it is the latest version available and added 3.5
 because 3.6 adoption seems rather low (hard to find any relevant sources on
 this though).
 If the beam community prefers 3.4, I would propose to target 3.4 only
 during porting and add 3.5 and 3.6 later so we don't slow down the porting
 progress. 3.4 has the advantage of already being installed on the workers
 and allows pySpark pipelines to be moved over to beam more easily.
 It would be great to get some opinions on this.

>>>
>> My preference is to support 3.4+. I searched a bit on the web to
>> understand the usage statistics for python 3, it seems like python 3.4 has
>> ~20% usage and python 3.4+ has 99% (
>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>> Based on that, I think it makes sense to support it.
>>
>>
>>
>>>
 Another comment was made on how to avoid regression during the porting
 progress.
 After applying step 1 and step 2, no python 3 compatibility lint
 warnings should remain, so it would be great if we could enforce this check
 for every pull request on an already updated subpackage.
 After applying step 3, all tests should run on python 3, so again it
 would be great if we can enforce these per updated subpackage.
 Any insights on how to best accomplish this?

>>> So you can look at some of the recent changes to tox.ini in the git log
>>> to see what we’ve done so far around this I suspect you can repeat that
>>> same pattern.
>>>
>>
>> +1 updating tox.ini and 

Re: [PROPOSAL] Python 3 support

2018-03-30 Thread Robert Bradshaw
On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders 
wrote:

> Thanks Ahmet and Robert,
>
> I think we can work on different subpackages in parallel, but it's
> important to apply the same strategy everywhere. I'm currently working on
> applying step 1 (was mostly done already) and 2 of the proposal to the
> coders subpackage to create a first pull request. We can then discuss the
> applied strategy in detail before merging and applying it to the other
> subpackages.
>

Sounds good. Again, could you document (in a more permanent/easy to look up
state than email) when packages are started/done?


> This strategy also includes the choice of automated tools. I'm focusing on
> writing python 3 code with python 2 compatibility, which means depending on
> the future package instead of the six package (which is already used in
> some places in the current code base). I have already noticed that this
> indeed requires a lot of manual work after running the automated script.
> The future package supports python 3.3+ compatibility, so I don't think
> there is a higher cost supporting 3.4 compared to 3.5+.
>

Sure. It may incur a higher maintenance burden long-term though.
(Basically, if we go out the door with 3.4 it's a promise to support it for
some time to come.)


> I have already added a tox environment to run pylint2 with the --py3k
> argument per updated subpackage, which should help avoid regression between
> step 2 and step 3 of the proposal. This update will be pushed with the
> first pull request.
>
> Kind regards,
> Robbe
>
>
> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw  wrote:
>
>> Thank you, Robbie, for your offer to help with contribution here. I read
>> over your doc and the one thing I'd like to add is that this work is very
>> parallelizable, but if we have enough people looking at it we'll want some
>> way to coordinate so as to not overlap work (or just waste time discovering
>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>> a spreadsheet with modules/packages on one axis and the various
>> automated/manual conversions along the other would be helpful?
>>
>> A note on automated tools, they're sometimes overly conservative, so we
>> should be sure to review the changes manually. (A typical example of this
>> is unnecessarily importing six.moves.xrange when there was no big reason to
>> use xrange over range in Python 2, or conversely using list(range(...) in
>> Python 3.)
>>
>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>> identify it and decide that before widespread announcement.
>>
>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau 
>>> wrote:
>>>

 On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders 
 wrote:

> Hi Anand,
>
> Thanks for the feedback.
>
> It should be no problem to run everything on DataflowRunner as well.
> Are there any performance tests in place to check for performance
> regressions?
>

>>> Yes there is a suite (
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>> It may not be very comprehensive and seems to be failing for a while. I
>>> would not block python 3 work on performance for now. That is the
>>> unfortuante state of things.
>>>
>>> If anybody in the community is interested, this would be a great
>>> opportunity to help with benchmarks in general.
>>>
>>>

> Some questions were raised in the proposal document which I want to
> add to this conversation:
>
> The first comment was about the targeted python 3 versions. We
> proposed to target 3.6 since it is the latest version available and added
> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
> sources on this though).
> If the beam community prefers 3.4, I would propose to target 3.4 only
> during porting and add 3.5 and 3.6 later so we don't slow down the porting
> progress. 3.4 has the advantage of already being installed on the workers
> and allows pySpark pipelines to be moved over to beam more easily.
> It would be great to get some opinions on this.
>

>>> My preference is to support 3.4+. I searched a bit on the web to
>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>> ~20% usage and python 3.4+ has 99% (
>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>> Based on that, I think it makes sense to support it.
>>>
>>>
>>>

> Another comment was made on how to avoid regression during the porting
> progress.
> After applying step 1 and step 2, no python 3 compatibility lint
> warnings should remain, so it would be great 

Python postcommit and precommit

2018-03-30 Thread Udi Meiri
Hi,

I noticed that Python precommit runs using this command:
  mvn clean install -pl sdks/python -am -amd
while postcommit invocation is simply a bash script:
  bash sdks/python/run_postcommit.sh

Both run unit tests via Tox, however since the runtime environment setup is
configured in different files (pom.xml vs shell script), they don't always
agree in their results (precommit is currently succeeded while postcommit
is failing).

So my naive question is: why does Python precommit run via Maven/Gradle?
Could we not just use a script like run_postcommit.sh?

(Side note: there's a lot of code/config duplication, such as: pypi package
versions, *.c, *.so, etc. cleanup)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Lukasz Cwik
Congrats all.

On Fri, Mar 30, 2018 at 4:29 PM Pablo Estrada  wrote:

> Congratulations y'all! Very cool.
> Best
> -P.
>
> On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:
>
>> Now that this is public... please join me in welcoming three newly
>> elected members of the Apache Software Foundation with ties to this
>> community, who were elected during the most recent Members' Meeting.
>>
>> * Ismaël Mejía (Beam PMC)
>>
>> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>>
>> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
>> contributor)
>>
>> These individuals demonstrated merit in Foundation's growth, evolution,
>> and progress. They were recognized, nominated, and elected by existing
>> membership for their significant impact to the Foundation as a whole, such
>> as the roots of project-related and cross-project activities.
>>
>> As members, they now become legal owners and shareholders of the
>> Foundation. They can vote for the Board, incubate new projects, nominate
>> new members, participate in any PMC-private discussions, and contribute to
>> any project.
>>
>> (For the Beam community, this election nearly doubles the number of
>> Foundation members. The new members are joining Jean-Baptiste Onofré,
>> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>>
>> I'm happy to be able to call all three of you my fellow members.
>> Congratulations!
>>
>>
>> Davor
>>
> --
> Got feedback? go/pabloem-feedback
> 
>


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Ahmet Altay
Congratulations to all of you!

On Fri, Mar 30, 2018, 4:29 PM Pablo Estrada  wrote:

> Congratulations y'all! Very cool.
> Best
> -P.
>
> On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:
>
>> Now that this is public... please join me in welcoming three newly
>> elected members of the Apache Software Foundation with ties to this
>> community, who were elected during the most recent Members' Meeting.
>>
>> * Ismaël Mejía (Beam PMC)
>>
>> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>>
>> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
>> contributor)
>>
>> These individuals demonstrated merit in Foundation's growth, evolution,
>> and progress. They were recognized, nominated, and elected by existing
>> membership for their significant impact to the Foundation as a whole, such
>> as the roots of project-related and cross-project activities.
>>
>> As members, they now become legal owners and shareholders of the
>> Foundation. They can vote for the Board, incubate new projects, nominate
>> new members, participate in any PMC-private discussions, and contribute to
>> any project.
>>
>> (For the Beam community, this election nearly doubles the number of
>> Foundation members. The new members are joining Jean-Baptiste Onofré,
>> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>>
>> I'm happy to be able to call all three of you my fellow members.
>> Congratulations!
>>
>>
>> Davor
>>
> --
> Got feedback? go/pabloem-feedback
>


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Pablo Estrada
Congratulations y'all! Very cool.
Best
-P.

On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:

> Now that this is public... please join me in welcoming three newly elected
> members of the Apache Software Foundation with ties to this community, who
> were elected during the most recent Members' Meeting.
>
> * Ismaël Mejía (Beam PMC)
>
> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>
> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
> contributor)
>
> These individuals demonstrated merit in Foundation's growth, evolution,
> and progress. They were recognized, nominated, and elected by existing
> membership for their significant impact to the Foundation as a whole, such
> as the roots of project-related and cross-project activities.
>
> As members, they now become legal owners and shareholders of the
> Foundation. They can vote for the Board, incubate new projects, nominate
> new members, participate in any PMC-private discussions, and contribute to
> any project.
>
> (For the Beam community, this election nearly doubles the number of
> Foundation members. The new members are joining Jean-Baptiste Onofré,
> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>
> I'm happy to be able to call all three of you my fellow members.
> Congratulations!
>
>
> Davor
>
-- 
Got feedback? go/pabloem-feedback


Re: Golang Beam SDK GroupByKey not working when running locally

2018-03-30 Thread 8 Gianfortoni
Fix cc to correct Holden.

On Fri, Mar 30, 2018 at 5:05 PM, 8 Gianfortoni <8...@tokentransit.com> wrote:

> Hi dev team,
>
> I'm having a lot of trouble running any pipeline that calls GroupByKey.
> Maybe I'm doing something wrong, but for some reason I cannot get
> GroupByKey not to crash the program.
>
> I have edited wordcount.go and minimal_wordcount.go to work similarly to
> my own program, and it crashes for those as well.
>
> Here is the snippet of code I added to minimal_wordcount (full source
> attached):
>
> // Concept #3: Invoke the stats.Count transform on our
> PCollection of
>
> // individual words. The Count transform returns a new
> PCollection of
>
> // key/value pairs, where each key represents a unique word in
> the text.
>
> // The associated value is the occurrence count for that word.
>
> singles := beam.ParDo(s, func(word string) (string, int) {
>
> return word, 1
>
> }, words)
>
>
> grouped := beam.GroupByKey(s, singles)
>
>
> counted := beam.ParDo(s, func(word string, values func(*int)
> bool) (string, int) {
>
> sum := 0
>
> for {
>
> var i int
>
> if values() {
>
> sum = sum + i
>
> } else {
>
> break
>
> }
>
> }
>
> return word, sum
>
> }, grouped)
>
>
> // Use a ParDo to format our PCollection of word counts into a
> printable
>
> // string, suitable for writing to an output file. When each
> element
>
> // produces exactly one element, the DoFn can simply return it.
>
> formatted := beam.ParDo(s, func(w string, c int) string {
>
> return fmt.Sprintf("%s: %v", w, c)
>
> }, counted)
>
>
>
> I also attached the full source code and output that happens when I run
> both wordcount and minimal_wordcount.
>
> Am I just doing something wrong here? In any case, it seems inappropriate
> to panic during runtime without any debugging information (save a stack
> trace, but only if you call beamx.Run() as opposed to direct.Execute(),
> which just dies without any info.
>
> Thank you so much,
> 8
>


Re: Gradle migration fixit: April 3

2018-03-30 Thread Lukasz Cwik
I have started a doc[1] containing a Gradle primer to help people be more
productive during the fixit day. Feel free to add/update comments and
content.

1:
https://docs.google.com/document/d/1EiTwEMD8FNhU4Ok6jthASpmK3-1hiAYzVTrdl8qBLrs/edit?usp=sharing

On Fri, Mar 30, 2018 at 8:33 AM Reuven Lax  wrote:

> Here is the Kanban board tracking all the current tasks. It looks like 7
> of them have already been closed over the past two days, so we're down to
> 19!
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=242
>
> On Thu, Mar 29, 2018 at 11:21 PM Romain Manni-Bucau 
> wrote:
>
>> Yep - sorry if it was unclear. I know linux distro often do it (never
>> understood why though).
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>> 2018-03-30 7:54 GMT+02:00 Reuven Lax :
>>
>>>
>>>
>>> On Thu, Mar 29, 2018 at 10:28 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 It was more about the drop of poms (same case as you for dataflow).

>>>
>>> Ah - you're worried that some external users are building directly from
>>> the poms rather than using the published artifact.
>>>
>>> I think this is a valid concern, and I agree we should announce on users@
>>> before (probably some time before) deleting the poms  .
>>>
>>>
 On that there is a missing but highly important task: gradle to mvn
 descriptors. All the one I saw were corrupted poms so we must take care of
 that as part of the release work (I can work on it on the 3rd if you want).

 Le 29 mars 2018 23:36, "Reuven Lax"  a écrit :

> I don't mind notifying users@, but this does seem more interesting
> for dev@. We will continue to publish Maven artifacts from our Gradle
> build, so users are still free to use either Maven or Gradle.
>
> That being said, if this is interesting to users@ we can notify them
> as well.
>
> Reuven
>
> On Thu, Mar 29, 2018 at 1:45 PM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>>
>>
>> Le 29 mars 2018 21:20, "Reuven Lax"  a écrit :
>>
>>
>>
>> On Thu, Mar 29, 2018 at 12:17 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>>
>>>
>>> Le 29 mars 2018 20:35, "Reuven Lax"  a écrit :
>>>
>>> 1. As Luke already mentioned, we should first have a subgoal of the
>>> Gradle jenkins jobs being equivalent to the Maven jobs. Hopefully toward
>>> the end of the day, we'll make this change.
>>>
>>> 2. Let's see how much progress we make on the third. There is a side
>>> problem we have here at Google - we have an internal product called
>>> Dataflow built on Beam, and the Dataflow build still depends on those 
>>> pom
>>> files. I would request leaving the pom files around just a little bit
>>> longer even if Beam no longer needs them, just so that we don't breakd
>>> Dataflow (and I think we would do this for any community members with a
>>> similar issue). We will prioritize moving Dataflow ASAP, it's just that 
>>> the
>>> people who will do so will also be in the April 3 fixit so it can't 
>>> happen
>>> until after. I think the delay should only be one or two weeks to delete
>>> the poms (assuming that Beam is ready at the end of the day).
>>>
>>>
>>> Can you try to put a date of that and we will communicate on it
>>> publicly if anyone else does (i dont think but i didnt expect you to do 
>>> it
>>> too ;)).
>>>
>>
>> What do you mean by communicate publicly? The dev list is already
>> cced here :)
>>
>>
>> Misses a (not too far) date and users@ pby to be safe ;)
>>
>>
>>>
>>>
>>>
>>> On Thu, Mar 29, 2018 at 5:20 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Hi Reuven, a few questions:

 1. any inputs on how we can work on the jenkins part? Do we test it
 live wiht "fake" PRs?
 2. What's the rational to not start by deleting the poms? Sounds
 like it will be a day working on gradle and on the 4th we'll be back on
 maven


 Romain Manni-Bucau
 @rmannibucau  |  Blog
  | Old Blog
  | Github
  | LinkedIn
  | Book
 

Re: Golang Beam SDK GroupByKey not working when running locally

2018-03-30 Thread 8 Gianfortoni
Oh, I forgot to mention that I pulled from master with this commit as
latest:
https://github.com/apache/beam/commit/95a524e52606de1467b5d8b2cc99263b8a111a8d



On Fri, Mar 30, 2018, 5:09 PM 8 Gianfortoni <8...@tokentransit.com> wrote:

> Fix cc to correct Holden.
>
> On Fri, Mar 30, 2018 at 5:05 PM, 8 Gianfortoni <8...@tokentransit.com> wrote:
>
>> Hi dev team,
>>
>> I'm having a lot of trouble running any pipeline that calls GroupByKey.
>> Maybe I'm doing something wrong, but for some reason I cannot get
>> GroupByKey not to crash the program.
>>
>> I have edited wordcount.go and minimal_wordcount.go to work similarly to
>> my own program, and it crashes for those as well.
>>
>> Here is the snippet of code I added to minimal_wordcount (full source
>> attached):
>>
>> // Concept #3: Invoke the stats.Count transform on our
>> PCollection of
>>
>> // individual words. The Count transform returns a new
>> PCollection of
>>
>> // key/value pairs, where each key represents a unique word in
>> the text.
>>
>> // The associated value is the occurrence count for that word.
>>
>> singles := beam.ParDo(s, func(word string) (string, int) {
>>
>> return word, 1
>>
>> }, words)
>>
>>
>> grouped := beam.GroupByKey(s, singles)
>>
>>
>> counted := beam.ParDo(s, func(word string, values func(*int)
>> bool) (string, int) {
>>
>> sum := 0
>>
>> for {
>>
>> var i int
>>
>> if values() {
>>
>> sum = sum + i
>>
>> } else {
>>
>> break
>>
>> }
>>
>> }
>>
>> return word, sum
>>
>> }, grouped)
>>
>>
>> // Use a ParDo to format our PCollection of word counts into a
>> printable
>>
>> // string, suitable for writing to an output file. When each
>> element
>>
>> // produces exactly one element, the DoFn can simply return it.
>>
>> formatted := beam.ParDo(s, func(w string, c int) string {
>>
>> return fmt.Sprintf("%s: %v", w, c)
>>
>> }, counted)
>>
>>
>>
>> I also attached the full source code and output that happens when I run
>> both wordcount and minimal_wordcount.
>>
>> Am I just doing something wrong here? In any case, it seems inappropriate
>> to panic during runtime without any debugging information (save a stack
>> trace, but only if you call beamx.Run() as opposed to direct.Execute(),
>> which just dies without any info.
>>
>> Thank you so much,
>> 8
>>
>
>


Golang Beam SDK GroupByKey not working when running locally

2018-03-30 Thread 8 Gianfortoni
Hi dev team,

I'm having a lot of trouble running any pipeline that calls GroupByKey.
Maybe I'm doing something wrong, but for some reason I cannot get
GroupByKey not to crash the program.

I have edited wordcount.go and minimal_wordcount.go to work similarly to my
own program, and it crashes for those as well.

Here is the snippet of code I added to minimal_wordcount (full source
attached):

// Concept #3: Invoke the stats.Count transform on our PCollection
of

// individual words. The Count transform returns a new PCollection
of

// key/value pairs, where each key represents a unique word in the
text.

// The associated value is the occurrence count for that word.

singles := beam.ParDo(s, func(word string) (string, int) {

return word, 1

}, words)


grouped := beam.GroupByKey(s, singles)


counted := beam.ParDo(s, func(word string, values func(*int) bool)
(string, int) {

sum := 0

for {

var i int

if values() {

sum = sum + i

} else {

break

}

}

return word, sum

}, grouped)


// Use a ParDo to format our PCollection of word counts into a
printable

// string, suitable for writing to an output file. When each element

// produces exactly one element, the DoFn can simply return it.

formatted := beam.ParDo(s, func(w string, c int) string {

return fmt.Sprintf("%s: %v", w, c)

}, counted)



I also attached the full source code and output that happens when I run
both wordcount and minimal_wordcount.

Am I just doing something wrong here? In any case, it seems inappropriate
to panic during runtime without any debugging information (save a stack
trace, but only if you call beamx.Run() as opposed to direct.Execute(),
which just dies without any info.

Thank you so much,
8
[{6: KV/GW/KV}]
[{10: KV/GW/KV}]
2018/03/30 16:32:15 Pipeline:
2018/03/30 16:32:15 Nodes: {1: []uint8/GW/bytes}
{2: string/GW/bytes}
{3: string/GW/bytes}
{4: string/GW/bytes}
{5: string/GW/bytes}
{6: KV/GW/KV}
{7: CoGBK/GW/CoGBK}
{8: KV/GW/KV}
{9: string/GW/bytes}
{10: KV/GW/KV}
{11: CoGBK/GW/CoGBK}
Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/GW/bytes}]
2: ParDo [In(Main): []uint8 <- {1: []uint8/GW/bytes}] -> [Out: T -> {2: 
string/GW/bytes}]
3: ParDo [In(Main): string <- {2: string/GW/bytes}] -> [Out: string -> {3: 
string/GW/bytes}]
4: ParDo [In(Main): string <- {3: string/GW/bytes}] -> [Out: string -> {4: 
string/GW/bytes}]
5: ParDo [In(Main): string <- {4: string/GW/bytes}] -> [Out: string -> {5: 
string/GW/bytes}]
6: ParDo [In(Main): string <- {5: string/GW/bytes}] -> [Out: KV -> 
{6: KV/GW/KV}]
7: CoGBK [In(Main): KV <- {6: 
KV/GW/KV}] -> [Out: CoGBK -> {7: 
CoGBK/GW/CoGBK}]
8: ParDo [In(Main): CoGBK <- {7: 
CoGBK/GW/CoGBK}] -> [Out: KV -> {8: 
KV/GW/KV}]
9: ParDo [In(Main): KV <- {8: 
KV/GW/KV}] -> [Out: string -> {9: 
string/GW/bytes}]
10: ParDo [In(Main): T <- {9: string/GW/bytes}] -> [Out: KV -> {10: 
KV/GW/KV}]
11: CoGBK [In(Main): KV <- {10: 
KV/GW/KV}] -> [Out: CoGBK -> {11: 
CoGBK/GW/CoGBK}]
12: ParDo [In(Main): CoGBK <- {11: 
CoGBK/GW/CoGBK}] -> []
2018/03/30 16:32:16 Reading from 
gs://apache-beam-samples/shakespeare/1kinghenryiv.txt
2018/03/30 16:32:16 Reading from 
gs://apache-beam-samples/shakespeare/1kinghenryvi.txt
2018/03/30 16:32:17 Reading from 
gs://apache-beam-samples/shakespeare/2kinghenryiv.txt
2018/03/30 16:32:17 Reading from 
gs://apache-beam-samples/shakespeare/2kinghenryvi.txt
2018/03/30 16:32:18 Reading from 
gs://apache-beam-samples/shakespeare/3kinghenryvi.txt
2018/03/30 16:32:18 Reading from 
gs://apache-beam-samples/shakespeare/allswellthatendswell.txt
2018/03/30 16:32:19 Reading from 
gs://apache-beam-samples/shakespeare/antonyandcleopatra.txt
2018/03/30 16:32:19 Reading from 
gs://apache-beam-samples/shakespeare/asyoulikeit.txt
2018/03/30 16:32:19 Reading from 
gs://apache-beam-samples/shakespeare/comedyoferrors.txt
2018/03/30 16:32:20 Reading from 
gs://apache-beam-samples/shakespeare/coriolanus.txt
2018/03/30 16:32:20 Reading from 

Re: Adding a StepMetadataRegistry for Python SDK

2018-03-30 Thread Lukasz Cwik
+1 on minimizing creating new stuff that will be deleted but if it gets us
to that goal faster it can still be worthwhile.

On Thu, Mar 29, 2018 at 5:51 PM Robert Bradshaw  wrote:

> If I understand correctly, this is something runner-specific that would
> live solely on the runner side (i.e. over the Fn API we'd still have a
> single name for operations rather than pushing this complexity into that
> protocol as well which I'd really like to avoid, right?) If that's the
> case, then it's a bit unclear what we'd be doing on the Python side, as all
> the non-SDK worker code is going to be thrown away in the new world and I'd
> like to avoid investing too much more there.
>
> On Wed, Mar 28, 2018 at 5:13 PM Pablo Estrada  wrote:
>
>> Hello all,
>> I've filed https://issues.apache.org/jira/browse/BEAM-3955, to consider
>> the possibility of adding some sort of facility to translate different
>> names for the runners.
>> This is currently a problem in Dataflow, where steps can have different
>> names in the backend and in the SDK.
>> This is observable in Beam code, where different parts of the
>> SDK/worker/runners use different names in their metrics:
>>
>> - Logging uses Beam transform names (e.g. Foo/Bar)
>> - Metrics uses operation_name (e.g. s2)
>> - Statesampler uses operation_name.
>> - The Dataflow worker sets step_name to operation_name after creating the
>> operation.
>>
>> I'd like to propose the following design outline:
>>
>>- Create an e*xecution context *that will allow runners to provide
>>their specific functionality*.*
>>- Execution context will be able to provide multiple runner-specific
>>functionality (e.g. side input fetchers).
>>- In this case, the execution contexts can have a StepNameRegistry,
>>or StepRegistry, or StepMetadataRegistry of some kind, where step names 
>> and
>>other metadata can be enrolled.
>>- Runners can pass their execution contexts to operations, logging,
>>and other modules.
>>- Beam core can then switch to use Beam step names, and each runner's
>>specific monitoring / metrics / etc classes can have their own logic for
>>accessing these.
>>- This would also allow us to remove the LoggingContext tracking, and
>>rely only on statesampler for context tracking.
>>
>> Eventually, all of this should be fully contained in the portability API
>> and runners won't have to deal with these issues, but for now it seems like
>> a good compromise.
>>
>> If this sounds good, I'll start working to implement that.
>> Note that this is only a rough description, and I'm open to reconsider
>> any and all aspects.
>>
>> Best
>> -P.
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
>


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Aviem Zur
Congrats!

On Sat, Mar 31, 2018 at 2:30 AM Ahmet Altay  wrote:

> Congratulations to all of you!
>
>
> On Fri, Mar 30, 2018, 4:29 PM Pablo Estrada  wrote:
>
>> Congratulations y'all! Very cool.
>> Best
>> -P.
>>
>> On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:
>>
>>> Now that this is public... please join me in welcoming three newly
>>> elected members of the Apache Software Foundation with ties to this
>>> community, who were elected during the most recent Members' Meeting.
>>>
>>> * Ismaël Mejía (Beam PMC)
>>>
>>> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>>>
>>> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
>>> contributor)
>>>
>>> These individuals demonstrated merit in Foundation's growth, evolution,
>>> and progress. They were recognized, nominated, and elected by existing
>>> membership for their significant impact to the Foundation as a whole, such
>>> as the roots of project-related and cross-project activities.
>>>
>>> As members, they now become legal owners and shareholders of the
>>> Foundation. They can vote for the Board, incubate new projects, nominate
>>> new members, participate in any PMC-private discussions, and contribute to
>>> any project.
>>>
>>> (For the Beam community, this election nearly doubles the number of
>>> Foundation members. The new members are joining Jean-Baptiste Onofré,
>>> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>>>
>>> I'm happy to be able to call all three of you my fellow members.
>>> Congratulations!
>>>
>>>
>>> Davor
>>>
>> --
>> Got feedback? go/pabloem-feedback
>>
>


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Jean-Baptiste Onofré
Congrats !

Regards
JB

Le 31 mars 2018 à 01:09, à 01:09, Davor Bonaci  a écrit:
>Now that this is public... please join me in welcoming three newly
>elected
>members of the Apache Software Foundation with ties to this community,
>who
>were elected during the most recent Members' Meeting.
>
>* Ismaël Mejía (Beam PMC)
>
>* Josh Wills (Crunch Chair; Beam, DataFu PMC)
>
>* Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
>contributor)
>
>These individuals demonstrated merit in Foundation's growth, evolution,
>and
>progress. They were recognized, nominated, and elected by existing
>membership for their significant impact to the Foundation as a whole,
>such
>as the roots of project-related and cross-project activities.
>
>As members, they now become legal owners and shareholders of the
>Foundation. They can vote for the Board, incubate new projects,
>nominate
>new members, participate in any PMC-private discussions, and contribute
>to
>any project.
>
>(For the Beam community, this election nearly doubles the number of
>Foundation members. The new members are joining Jean-Baptiste Onofré,
>Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>
>I'm happy to be able to call all three of you my fellow members.
>Congratulations!
>
>Davor


Re: Golang Beam SDK GroupByKey not working when running locally

2018-03-30 Thread Henning Rohde
Hi 8,

 This is a bug in the Go SDK regarding direct output after GBK. As a
workaround, if you change this signature

func(word string, values func(*int) bool) (string, int)

to

func(word string, values func(*int) bool, emit func (string, int))

and emits the result instead of returning it, it works. Opened
https://issues.apache.org/jira/browse/BEAM-3978.

Thanks,
 Henning

PS: Btw, the minimal_wordcount doesn't log the direct.Execute error (among
other things) and is there mainly to mimic the progression in Java. It's
not a good model for real pipelines.



On Fri, Mar 30, 2018 at 5:21 PM 8 Gianfortoni <8...@tokentransit.com> wrote:

> Oh, I forgot to mention that I pulled from master with this commit as
> latest:
> https://github.com/apache/beam/commit/95a524e52606de1467b5d8b2cc99263b8a111a8d
>
>
>
> On Fri, Mar 30, 2018, 5:09 PM 8 Gianfortoni <8...@tokentransit.com> wrote:
>
>> Fix cc to correct Holden.
>>
>> On Fri, Mar 30, 2018 at 5:05 PM, 8 Gianfortoni <8...@tokentransit.com>
>> wrote:
>>
>>> Hi dev team,
>>>
>>> I'm having a lot of trouble running any pipeline that calls GroupByKey.
>>> Maybe I'm doing something wrong, but for some reason I cannot get
>>> GroupByKey not to crash the program.
>>>
>>> I have edited wordcount.go and minimal_wordcount.go to work similarly
>>> to my own program, and it crashes for those as well.
>>>
>>> Here is the snippet of code I added to minimal_wordcount (full source
>>> attached):
>>>
>>> // Concept #3: Invoke the stats.Count transform on our
>>> PCollection of
>>>
>>> // individual words. The Count transform returns a new
>>> PCollection of
>>>
>>> // key/value pairs, where each key represents a unique word in
>>> the text.
>>>
>>> // The associated value is the occurrence count for that word.
>>>
>>> singles := beam.ParDo(s, func(word string) (string, int) {
>>>
>>> return word, 1
>>>
>>> }, words)
>>>
>>>
>>> grouped := beam.GroupByKey(s, singles)
>>>
>>>
>>> counted := beam.ParDo(s, func(word string, values func(*int)
>>> bool) (string, int) {
>>>
>>> sum := 0
>>>
>>> for {
>>>
>>> var i int
>>>
>>> if values() {
>>>
>>> sum = sum + i
>>>
>>> } else {
>>>
>>> break
>>>
>>> }
>>>
>>> }
>>>
>>> return word, sum
>>>
>>> }, grouped)
>>>
>>>
>>> // Use a ParDo to format our PCollection of word counts into a
>>> printable
>>>
>>> // string, suitable for writing to an output file. When each
>>> element
>>>
>>> // produces exactly one element, the DoFn can simply return it.
>>>
>>> formatted := beam.ParDo(s, func(w string, c int) string {
>>>
>>> return fmt.Sprintf("%s: %v", w, c)
>>>
>>> }, counted)
>>>
>>>
>>>
>>> I also attached the full source code and output that happens when I run
>>> both wordcount and minimal_wordcount.
>>>
>>> Am I just doing something wrong here? In any case, it seems
>>> inappropriate to panic during runtime without any debugging information
>>> (save a stack trace, but only if you call beamx.Run() as opposed to
>>> direct.Execute(), which just dies without any info.
>>>
>>> Thank you so much,
>>> 8
>>>
>>
>>


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Rafael Fernandez
Congratulations!!!

On Fri, Mar 30, 2018 at 8:29 PM Aviem Zur  wrote:

> Congrats!
>
> On Sat, Mar 31, 2018 at 2:30 AM Ahmet Altay  wrote:
>
>> Congratulations to all of you!
>>
>>
>> On Fri, Mar 30, 2018, 4:29 PM Pablo Estrada  wrote:
>>
>>> Congratulations y'all! Very cool.
>>> Best
>>> -P.
>>>
>>> On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:
>>>
 Now that this is public... please join me in welcoming three newly
 elected members of the Apache Software Foundation with ties to this
 community, who were elected during the most recent Members' Meeting.

 * Ismaël Mejía (Beam PMC)

 * Josh Wills (Crunch Chair; Beam, DataFu PMC)

 * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
 contributor)

 These individuals demonstrated merit in Foundation's growth, evolution,
 and progress. They were recognized, nominated, and elected by existing
 membership for their significant impact to the Foundation as a whole, such
 as the roots of project-related and cross-project activities.

 As members, they now become legal owners and shareholders of the
 Foundation. They can vote for the Board, incubate new projects, nominate
 new members, participate in any PMC-private discussions, and contribute to
 any project.

 (For the Beam community, this election nearly doubles the number of
 Foundation members. The new members are joining Jean-Baptiste Onofré,
 Stephan Ewen, Romain Manni-Bucau and myself in this role.)

 I'm happy to be able to call all three of you my fellow members.
 Congratulations!


 Davor

>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Xin Wang
Congrats!

2018-03-31 12:31 GMT+08:00 Rafael Fernandez :

> Congratulations!!!
>
> On Fri, Mar 30, 2018 at 8:29 PM Aviem Zur  wrote:
>
>> Congrats!
>>
>> On Sat, Mar 31, 2018 at 2:30 AM Ahmet Altay  wrote:
>>
>>> Congratulations to all of you!
>>>
>>>
>>> On Fri, Mar 30, 2018, 4:29 PM Pablo Estrada  wrote:
>>>
 Congratulations y'all! Very cool.
 Best
 -P.

 On Fri, Mar 30, 2018 at 4:09 PM Davor Bonaci  wrote:

> Now that this is public... please join me in welcoming three newly
> elected members of the Apache Software Foundation with ties to this
> community, who were elected during the most recent Members' Meeting.
>
> * Ismaël Mejía (Beam PMC)
>
> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>
> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer;
> Beam contributor)
>
> These individuals demonstrated merit in Foundation's growth,
> evolution, and progress. They were recognized, nominated, and elected by
> existing membership for their significant impact to the Foundation as a
> whole, such as the roots of project-related and cross-project activities.
>
> As members, they now become legal owners and shareholders of the
> Foundation. They can vote for the Board, incubate new projects, nominate
> new members, participate in any PMC-private discussions, and contribute to
> any project.
>
> (For the Beam community, this election nearly doubles the number of
> Foundation members. The new members are joining Jean-Baptiste Onofré,
> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>
> I'm happy to be able to call all three of you my fellow members.
> Congratulations!
>
>
> Davor
>
 --
 Got feedback? go/pabloem-feedback
 

>>>


-- 
Thanks,
Xin


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Reuven Lax
Congratulations!

On Fri, Mar 30, 2018, 4:09 PM Davor Bonaci  wrote:

> Now that this is public... please join me in welcoming three newly elected
> members of the Apache Software Foundation with ties to this community, who
> were elected during the most recent Members' Meeting.
>
> * Ismaël Mejía (Beam PMC)
>
> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>
> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
> contributor)
>
> These individuals demonstrated merit in Foundation's growth, evolution,
> and progress. They were recognized, nominated, and elected by existing
> membership for their significant impact to the Foundation as a whole, such
> as the roots of project-related and cross-project activities.
>
> As members, they now become legal owners and shareholders of the
> Foundation. They can vote for the Board, incubate new projects, nominate
> new members, participate in any PMC-private discussions, and contribute to
> any project.
>
> (For the Beam community, this election nearly doubles the number of
> Foundation members. The new members are joining Jean-Baptiste Onofré,
> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>
> I'm happy to be able to call all three of you my fellow members.
> Congratulations!
>
> Davor
>


Re: [ANNOUCEMENT] New Foundation members!

2018-03-30 Thread Raghunandana Jayarama Reddy
Congratulations!

Best,
Raghu

On Sat, Mar 31, 2018 at 1:17 AM Jean-Baptiste Onofré 
wrote:

> Congrats !
>
> Regards
> JB
> Le 31 mars 2018, à 01:09, Davor Bonaci  a écrit:
>>
>> Now that this is public... please join me in welcoming three newly
>> elected members of the Apache Software Foundation with ties to this
>> community, who were elected during the most recent Members' Meeting.
>>
>> * Ismaël Mejía (Beam PMC)
>>
>> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>>
>> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
>> contributor)
>>
>> These individuals demonstrated merit in Foundation's growth, evolution,
>> and progress. They were recognized, nominated, and elected by existing
>> membership for their significant impact to the Foundation as a whole, such
>> as the roots of project-related and cross-project activities.
>>
>> As members, they now become legal owners and shareholders of the
>> Foundation. They can vote for the Board, incubate new projects, nominate
>> new members, participate in any PMC-private discussions, and contribute to
>> any project.
>>
>> (For the Beam community, this election nearly doubles the number of
>> Foundation members. The new members are joining Jean-Baptiste Onofré,
>> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>>
>> I'm happy to be able to call all three of you my fellow members.
>> Congratulations!
>>
>> Davor
>>
>