Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Jean-Baptiste Onofré
Yeah, especially, I think it would have been great to have a vote before merging 
on master.


Not a big deal, however, I'm really community focus ;)

Regards
JB

On 11/28/2017 07:36 AM, Reuven Lax wrote:
Agreed. I thinking having a formal vote before Luke had numbers and results 
would have been too early. However now that we have such numbers, we should 
think about having a vote.


Also, while I disagree with Romain that Gradle is not "enterprise ready" (it's 
heavily used by Netflix, LinkedIn, Siemens, and is the default build framework 
for Android apps), it would be interesting to see if any other ASF projects are 
using it. I don't think that should not make or break the decision - we should 
do what's best for the Beam project, and "everyone else is doing something" is 
rarely a good argument - it will provide good data points for us to evaluate.


Reuven

On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré > wrote:


Hi Luke,

just curious (and maybe I missed it): did we do a formal vote to merge the
gradle build ?
Gradle is now on master, we have some Jira to update the release guide with
gradle. It's fine, but I remember only a discussion, not a vote.

In order to embrace the community and avoid to have some contributors
"frustrated" (meaning that "this project doesn't care about contributor,
they just do whatever they want"), I would have love to see a formal vote
about Gradle more than just a discussion.

My $0.01

Regards
JB

On 11/27/2017 07:46 PM, Lukasz Cwik wrote:

I have collected data by running several builds against master using 
Gradle
and Maven without using Gradle's support for incremental builds.

Gradle (mins)
min: 25.04
max: 160.14
median: 45.78
average: 52.19
stdev: 30.80

Maven (mins)
min: 56.86
max: 216.55 (actually > 240 mins because this data does not include
timeouts)
median: 87.93
average: 109.10
stdev: 48.01

I excluded a few timeouts (240 mins) that happened during the Maven 
build
from its numbers but we can see conclusively that Gradle is twice as 
fast
for the build when compared to Maven when run using Jenkins.
On my desktop, I have enabled incremental builds and have seen a major
improvement on the above numbers but it doesn't yet work correctly 
because
of incorrectly specified inputs/outputs for certain tasks.

The data is available here

https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing



With this data, I feel confident that we should swap and have opened the
following issue https://issues.apache.org/jira/browse/BEAM-3249
 and related
sub-tasks.

On Sun, Nov 19, 2017 at 11:23 AM, Jean-Baptiste Onofré 
mailto:j...@nanthrax.net>>
wrote:

Thanks for the update Luke.

I'm updating my local working copy to do new tests.

Regards
JB

On 11/19/2017 08:21 PM, Lukasz Cwik wrote:

The gradle build rules have been merged, I'm adding a
precommit[1] to
start
collecting data about the build times. It currently only mirrors
the Java
mvn install precommit. I'll gather data over the next two weeks 
and
provide
a summary here.

You can rerun the precommit by issuing "Run Java Gradle 
PreCommit"

1: https://github.com/apache/beam/pull/4146



On Mon, Nov 13, 2017 at 9:08 AM, Lukasz Cwik mailto:lc...@google.com>> wrote:

There has been plenty of time for comments on the PR and the
approach.


So far Ken Knowles has provided the most feedback on the PR,
Ken would
you
like to finish the review?



On Fri, Nov 10, 2017 at 1:22 PM, Romain Manni-Bucau <
rmannibu...@gmail.com 

wrote:


This is only a setup thing and better to not break the
master history for

poc/tests, in particular when no very localized.
Alternative can be to
ask
another temp repo to infra and have a synchro between
both but dont
think
it does worth it personally.



Le 10 nov. 2017 1

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Reuven Lax
Agreed. I thinking having a formal vote before Luke had numbers and results
would have been too early. However now that we have such numbers, we should
think about having a vote.

Also, while I disagree with Romain that Gradle is not "enterprise ready"
(it's heavily used by Netflix, LinkedIn, Siemens, and is the default build
framework for Android apps), it would be interesting to see if any other
ASF projects are using it. I don't think that should not make or break the
decision - we should do what's best for the Beam project, and "everyone
else is doing something" is rarely a good argument - it will provide good
data points for us to evaluate.

Reuven

On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré 
wrote:

> Hi Luke,
>
> just curious (and maybe I missed it): did we do a formal vote to merge the
> gradle build ?
> Gradle is now on master, we have some Jira to update the release guide
> with gradle. It's fine, but I remember only a discussion, not a vote.
>
> In order to embrace the community and avoid to have some contributors
> "frustrated" (meaning that "this project doesn't care about contributor,
> they just do whatever they want"), I would have love to see a formal vote
> about Gradle more than just a discussion.
>
> My $0.01
>
> Regards
> JB
>
> On 11/27/2017 07:46 PM, Lukasz Cwik wrote:
>
>> I have collected data by running several builds against master using
>> Gradle
>> and Maven without using Gradle's support for incremental builds.
>>
>> Gradle (mins)
>> min: 25.04
>> max: 160.14
>> median: 45.78
>> average: 52.19
>> stdev: 30.80
>>
>> Maven (mins)
>> min: 56.86
>> max: 216.55 (actually > 240 mins because this data does not include
>> timeouts)
>> median: 87.93
>> average: 109.10
>> stdev: 48.01
>>
>> I excluded a few timeouts (240 mins) that happened during the Maven build
>> from its numbers but we can see conclusively that Gradle is twice as fast
>> for the build when compared to Maven when run using Jenkins.
>> On my desktop, I have enabled incremental builds and have seen a major
>> improvement on the above numbers but it doesn't yet work correctly because
>> of incorrectly specified inputs/outputs for certain tasks.
>>
>> The data is available here
>> https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_NJqEQakU
>> gnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing
>>
>> With this data, I feel confident that we should swap and have opened the
>> following issue https://issues.apache.org/jira/browse/BEAM-3249 and
>> related
>> sub-tasks.
>>
>> On Sun, Nov 19, 2017 at 11:23 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Thanks for the update Luke.
>>>
>>> I'm updating my local working copy to do new tests.
>>>
>>> Regards
>>> JB
>>>
>>> On 11/19/2017 08:21 PM, Lukasz Cwik wrote:
>>>
>>> The gradle build rules have been merged, I'm adding a precommit[1] to
 start
 collecting data about the build times. It currently only mirrors the
 Java
 mvn install precommit. I'll gather data over the next two weeks and
 provide
 a summary here.

 You can rerun the precommit by issuing "Run Java Gradle PreCommit"

 1: https://github.com/apache/beam/pull/4146


 On Mon, Nov 13, 2017 at 9:08 AM, Lukasz Cwik  wrote:

 There has been plenty of time for comments on the PR and the approach.

>
> So far Ken Knowles has provided the most feedback on the PR, Ken would
> you
> like to finish the review?
>
>
>
> On Fri, Nov 10, 2017 at 1:22 PM, Romain Manni-Bucau <
> rmannibu...@gmail.com
>
> wrote:
>>
>>
> This is only a setup thing and better to not break the master history
> for
>
>> poc/tests, in particular when no very localized. Alternative can be to
>> ask
>> another temp repo to infra and have a synchro between both but dont
>> think
>> it does worth it personally.
>>
>>
>>
>> Le 10 nov. 2017 18:57, "Lukasz Cwik"  a
>> écrit :
>>
>> The reason to get it on master is because that is where all the PRs
>>
>>>
>>> are. An
>>
>> upstream branch without any development means no data.
>>> Also, our Jenkins setup via job-dsl doesn't honor using the Jenkins
>>> configuration on the branch because the seed job always runs against
>>> master.
>>>
>>> On Thu, Nov 9, 2017 at 9:59 PM, Romain Manni-Bucau <
>>>
>>> rmannibu...@gmail.com>
>>
>> wrote:
>>>
>>> What about pushing it on a "upstream" branch and testing it for 1
>>>

 week in
>>>
>>
>> parallel of the maven reference build? If gradle is always 50% faster
>>>

 on
>>>
>>
>> jenkins then it could become master setup without much discussion I
>>>

 guess.
>>>
>>> We can even have 2 jenkins jobs: one with the daemon etc and one

 without.
>>>
>>
>>
>>> Also noticed yesterday that gradle build is killing my machine (al

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Jean-Baptiste Onofré

Hi Luke,

just curious (and maybe I missed it): did we do a formal vote to merge the 
gradle build ?
Gradle is now on master, we have some Jira to update the release guide with 
gradle. It's fine, but I remember only a discussion, not a vote.


In order to embrace the community and avoid to have some contributors 
"frustrated" (meaning that "this project doesn't care about contributor, they 
just do whatever they want"), I would have love to see a formal vote about 
Gradle more than just a discussion.


My $0.01

Regards
JB

On 11/27/2017 07:46 PM, Lukasz Cwik wrote:

I have collected data by running several builds against master using Gradle
and Maven without using Gradle's support for incremental builds.

Gradle (mins)
min: 25.04
max: 160.14
median: 45.78
average: 52.19
stdev: 30.80

Maven (mins)
min: 56.86
max: 216.55 (actually > 240 mins because this data does not include
timeouts)
median: 87.93
average: 109.10
stdev: 48.01

I excluded a few timeouts (240 mins) that happened during the Maven build
from its numbers but we can see conclusively that Gradle is twice as fast
for the build when compared to Maven when run using Jenkins.
On my desktop, I have enabled incremental builds and have seen a major
improvement on the above numbers but it doesn't yet work correctly because
of incorrectly specified inputs/outputs for certain tasks.

The data is available here
https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing

With this data, I feel confident that we should swap and have opened the
following issue https://issues.apache.org/jira/browse/BEAM-3249 and related
sub-tasks.

On Sun, Nov 19, 2017 at 11:23 AM, Jean-Baptiste Onofré 
wrote:


Thanks for the update Luke.

I'm updating my local working copy to do new tests.

Regards
JB

On 11/19/2017 08:21 PM, Lukasz Cwik wrote:


The gradle build rules have been merged, I'm adding a precommit[1] to
start
collecting data about the build times. It currently only mirrors the Java
mvn install precommit. I'll gather data over the next two weeks and
provide
a summary here.

You can rerun the precommit by issuing "Run Java Gradle PreCommit"

1: https://github.com/apache/beam/pull/4146


On Mon, Nov 13, 2017 at 9:08 AM, Lukasz Cwik  wrote:

There has been plenty of time for comments on the PR and the approach.


So far Ken Knowles has provided the most feedback on the PR, Ken would
you
like to finish the review?



On Fri, Nov 10, 2017 at 1:22 PM, Romain Manni-Bucau <
rmannibu...@gmail.com


wrote:



This is only a setup thing and better to not break the master history for

poc/tests, in particular when no very localized. Alternative can be to
ask
another temp repo to infra and have a synchro between both but dont
think
it does worth it personally.



Le 10 nov. 2017 18:57, "Lukasz Cwik"  a
écrit :

The reason to get it on master is because that is where all the PRs



are. An


upstream branch without any development means no data.
Also, our Jenkins setup via job-dsl doesn't honor using the Jenkins
configuration on the branch because the seed job always runs against
master.

On Thu, Nov 9, 2017 at 9:59 PM, Romain Manni-Bucau <


rmannibu...@gmail.com>


wrote:

What about pushing it on a "upstream" branch and testing it for 1



week in



parallel of the maven reference build? If gradle is always 50% faster



on



jenkins then it could become master setup without much discussion I



guess.


We can even have 2 jenkins jobs: one with the daemon etc and one


without.





Also noticed yesterday that gradle build is killing my machine (all 8


cores


are 100%) during the first minutes vs maven build which let me do


something


else. Then all the consumed time which makes gradle not that fast is


about


python. Will try to send figures later today.

Le 10 nov. 2017 00:10, "Lukasz Cwik"  a


écrit



:



I wouldn't mind merging this change in so I could setup those Gradle

Jenkins precommits.

As per our contribution guidelines, any committer willing to sign


off



on


the PR?


On Thu, Nov 9, 2017 at 2:12 PM, Romain Manni-Bucau <


rmannibu...@gmail.com>


wrote:

Le 9 nov. 2017 21:31, "Kenneth Knowles" 



a



écrit :




Keep in mind that a clean build is unusual during development (it


is



common



for mvn use and that is a bug) and also not necessary for


precommits



if


the



build tool is correct enough that caching is safe. So while this


number



matters, it is not the most important.



Not sure, in dev you bypass the build tool most of the time


anyway -



thanks



to IDE or other shortcuts - but not on PR and CI. Keep in mind


that



not


doing a clean and killing gradle daemon makes the build not



reproducible



and therefore useful :(. Starting to build from a subpart of the



reactor



-


with the mentionned mvn plugin for instance - can be nice on some


CI



like



travis if the caching is well configured but still not a guarantee



the



build is "green".


My trade 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Romain Manni-Bucau
Le 27 nov. 2017 23:07, "Eugene Kirpichov"  a
écrit :

On Mon, Nov 27, 2017 at 11:52 AM Romain Manni-Bucau 
wrote:

> 2017-11-27 20:26 GMT+01:00 Lukasz Cwik :
> > Romain, as mentioned earlier, I identified that Maven was slower because
> it
> > needed to finish building the entire module before dependent modules
> could
> > start which included running tests, performing checkstyle, etc...
> > Gradle is able to increase the parallelism of the build process since it
> > has task driven parallelism so as long as the files are compiled, the
> > dependent projects can start.
>
> This means we can implement a maven graph builder which is better than
> the default one - surely with a thread safe local repo - and
> contribute it back to solve it durably.
>

That would be indeed a wonderful contribution to the Java ecosystem as a
whole, but I'd like to reiterate that Maven has existed for very long, I'm
sure a lot of people have noticed its poor parallelization in the
meanwhile, and still nobody has even tried doing this AFAIK.

So it seems unlikely that this is doable in a reasonable timeframe, and I
don't think that the possibility of this is a strong argument against
switching to Gradle.


Agree. Just to clarify: my strong argument against gradle is that it is not
enterprise mainstream IMHO - dont forget github projects are a lot about
personal tests, small projects, so hard to use all the repos as a valid
stat - which leads to enhancing mvn rather than switching.

Again, as an ASF project, not even contacting our colleagues is not very
sane so I would go and check with them rather than just doing like a github
project maintained by a single company.

Side note on previous mail: mvn extensibility is "mature". Mojo let you do
what you want and you can customize the lifecycle as well. The main
difference is a standardized build (descriptive) vs a fully custom and "by
developer" styled script.



>
> >
> > Maven and Gradle are both heavily used since there are ~146k Maven
> projects
> > on Github while there are ~122k Gradle project on Github. Do you have
> data
> > which shows that Maven is significantly more "mainstream"?
>
> Yep, project i worked on in companies using gradle: 0, all were based
> on maven and maven was "tool-ed" versus gradle was "best effort" in
> term of plugins.
> Now  - with my EE background - I can guarantee you gradle is not able
> to handle properly its build since it flatten the classpath and
> plugins conflicts very quickly (their plugin dependency feature never
> worked and almost no plugin impl it correctly).
>
> Wonder if it is easy to have the ASF stats, anyone knows?
>
> >
> > I believe we want a rich multi-language SDK and community and feel as
> > though it would be unwise to treat non JDK based languages as second
> class.
>
> Hmm, not sure how it is related to the build tool since Maven and
> Gradle have the same level of support - actually surprsingly maven is
> better for js and surely as bad as gradle for others - or here again
> we can create plugin like the frontend-maven-plugin if needed for
> other languages.
>
> That said it can be an interesting other thread since people consuming
> these languages will probably want their mainstream build tool and a
> "standard" repository layout rather than a java one. But this is
> harder to measure.
>
> >
> > On Mon, Nov 27, 2017 at 11:00 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> Hi Lukasz,
> >>
> >> Did you manage to identify how maven was slower and test tesla stuff
> >> and potentially a few other fixes?
> >>
> >> Side note: figures without python can be interesting cause locally - =
> >> for me - python tends to flatten the figures whereas I get something
> >> close to your conclusions without python part.
> >>
> >> My point is mainly that switching now on gradle and being back on
> >> maven in a few months cause gradle ecosystem is far to support java 9
> >> - or any other volatile reason like this one - is probably not a good
> >> choice for a community. Maven is way more mainstream than gradle so
> >> helps to encourage people to contribute - vs gradle will increase the
> >> step to do it.
> >>
> >> I'd like to be sure before a switch that it is a one way decision and
> >> that the build tool was not just challenged by itself and its current
> >> state but also in the way it could be improved (= its community and
> >> potentially some local hacks).
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-27 19:46 GMT+01:00 Lukasz Cwik :
> >> > I have collected data by running several builds against master using
> >> Gradle
> >> > and Maven without using Gradle's support for incremental builds.
> >> >
> >> > Gradle (mins)
> >> > min: 25.04
> >> > max: 160.14
> >> > median: 45.78
> >> > average: 52.19
> >> > stdev: 30.80
> >> >
> >> > Maven (mins)
> >> > min: 56.86
> >> > max: 216.55 (actually > 240 mins because this data does not include
> >> > time

Re: [discuss] java profile

2017-11-27 Thread Romain Manni-Bucau
Hmm, no.

Incremental build is never correctly implemented cause there is just no way
to detect some dependencies statically with java code - or any dynamic
language.

Side note: same applies for gradle daemon usage BTW.

After if the list is not maintained it is a bug at the same level than
coding a toString() with "null.toString()". This is not very hard to handle
the list of modules and worse case a mvnextension can make it coded if you
feel more comfortable with this kind of solution.

Le 27 nov. 2017 23:12, "Lukasz Cwik"  a écrit :

> Manually whitelisting/blacklisting sub-modules is error prone since it
> hides issues due to incorrectly maintaining that list is the same argument
> as if the build process doesn't correctly invoke an incremental build
> process.
>
> On Mon, Nov 27, 2017 at 1:45 PM, Romain Manni-Bucau  >
> wrote:
>
> > Well for validation builds- pre PR, incremental support is pointless
> since
> > it easily hides issues die to caching so a solution saving half of the
> > build without loosing anuyhing would still be good IMHO.
> >
> > Le 27 nov. 2017 21:12, "Lukasz Cwik"  a écrit
> :
> >
> > > Incremental builds aren't correctly setup right now so your likely to
> see
> > > Python/Go rebuild even if there were no changes. See
> > > https://issues.apache.org/jira/browse/BEAM-3253
> > >
> > > On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau <
> > > rmannibu...@gmail.com>
> > > wrote:
> > >
> > > > that was the goal: validate there was no side effect of the changes
> on
> > > > the whole project. Now the "not java" part of the build will not be
> > > > impacted by java changed so this is the part i want to skip since it
> > > > takes a lot of time and I have guarantees it is safe to skip them.
> > > >
> > > > Romain Manni-Bucau
> > > > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > >
> > > >
> > > > 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
> > > > > Romain, that will build the entire project. I think you want to
> > execute
> > > > > (from the root of the project):
> > > > > ./gradlew :beam-sdks-parent:beam-sdks-python:build
> > > > >
> > > > > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
> > > > rmannibu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> gradle build --no-daemon
> > > > >>
> > > > >> (with gradle 4.2)
> > > > >>
> > > > >> Romain Manni-Bucau
> > > > >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > > >>
> > > > >>
> > > > >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles  > >:
> > > > >> > What is the gradle command you are using to build just the
> Python
> > > SDK?
> > > > >> >
> > > > >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
> > > > >> rmannibu...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Hmm,
> > > > >> >>
> > > > >> >> issue is the same with gradle (locally python build takes 15mn
> > > alone
> > > > >> >> which is as much as the java build and it is not parallelized I
> > > > think)
> > > > >> >>
> > > > >> >> pl is not as smooth since it means doing it on each command
> > whereas
> > > > >> >> the proposal is automatically activated through settings.xml
> > > > >> >>
> > > > >> >> Romain Manni-Bucau
> > > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > > >> >>
> > > > >> >>
> > > > >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles
> >  > > >:
> > > > >> >> > I think you can already mostly do this with mvn -pl sdks/XYZ
> > -am
> > > > >> -amd. I
> > > > >> >> > think that we have other work (gradle support) underway that
> > will
> > > > make
> > > > >> >> this
> > > > >> >> > a non-issue since gradle automatically does even better than
> > the
> > > > >> profile
> > > > >> >> or
> > > > >> >> > -am -amd.
> > > > >> >> >
> > > > >> >> > On Mon, Nov 27, 2017 at 11:01 AM, Romain Manni-Bucau <
> > > > >> >> rmannibu...@gmail.com>
> > > > >> >> > wrote:
> > > > >> >> >
> > > > >> >> >> Hi guys,
> > > > >> >> >>
> > > > >> >> >> java/python/go/xxx support is great but as a developer you
> > > rarely
> > > > >> hack
> > > > >> >> >> on them all.
> > > > >> >> >>
> > > > >> >> >> For that reason I opened https://github.com/apache/
> > > beam/pull/4173
> > > > .
> > > > >> >> >>
> > > > >> >> >> Goal is to give each developer a way to build the whole
> > project
> > > > and
> > > > >> >> >> all the code he can impact at once but without caring of the
> > > code
> > > > he
> > > > >> >> >> doesn't modify at all - other languages.
> > > > >> >> >>
> > > > >> >> >> Wdyt?
> > > > >> >> >>
> > > > >> >> >> Romain Manni-Bucau
> > > > >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
>


Re: GitBox down

2017-11-27 Thread Jean-Baptiste Onofré

Good point.

I will prepare a PR for that.

Thanks !
Regards
JB

On 11/27/2017 08:58 PM, Reuven Lax wrote:

We need to update the contribution guide as well.

On Nov 27, 2017 10:30 AM, "Jean-Baptiste Onofré"  wrote:


No I don't think so,

In your case, it's because you still refering git-wip-us (probably from
): you have to use gitbox instead.

I already updated master branch but I didn't update the release branch.

Regards
JB

On 11/27/2017 07:02 PM, Reuven Lax wrote:


Does this have anything to do with the fact that I'm starting to get:

fatal: repository 'https://git-wip-us.apache.org/repos/asf/beam.git/' not
found

On Sun, Nov 26, 2017 at 10:37 PM, Jean-Baptiste Onofré 
wrote:

Hi,


it seems GitBox is down (timing out).

I don't see anything on status.apache.org, I will ping INFRA.

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


[GitHub] kennknowles commented on issue #3538: Add a test for Avro write with RVP; fix code

2017-11-27 Thread GitBox
kennknowles commented on issue #3538: Add a test for Avro write with RVP; fix 
code
URL: https://github.com/apache/beam/pull/3538#issuecomment-347408240
 
 
   There are conflicts introduced - would you mind rebasing? That would also 
give more recent signal for our use of the GitHub merge button, for which the 
Jenkins result gets stale.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on issue #4168: [BEAM-3238][SQL] Add BeamRecordSqlTypeBuilder

2017-11-27 Thread GitBox
kennknowles commented on issue #4168: [BEAM-3238][SQL] Add 
BeamRecordSqlTypeBuilder
URL: https://github.com/apache/beam/pull/4168#issuecomment-347407859
 
 
   @xumingming what do you think now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles closed pull request #3677: Tez runner

2017-11-27 Thread GitBox
kennknowles closed pull request #3677: Tez runner
URL: https://github.com/apache/beam/pull/3677
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/runners/pom.xml b/runners/pom.xml
index 164d1b3a15b..1ad1347b809 100644
--- a/runners/pom.xml
+++ b/runners/pom.xml
@@ -42,6 +42,7 @@
 google-cloud-dataflow-java
 spark
 apex
+tez
 gcp
   
 
diff --git a/runners/tez/pom.xml b/runners/tez/pom.xml
new file mode 100644
index 000..b7d0d6d13db
--- /dev/null
+++ b/runners/tez/pom.xml
@@ -0,0 +1,135 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+
+  4.0.0
+  
+
+  
+org.apache.maven.plugins
+maven-compiler-plugin
+
+  1.8
+  1.8
+
+  
+
+
+
+  
+src/main/resources
+  
+
+
+  
+
+  
+org.apache.beam
+beam-runners-parent
+2.0.0
+../pom.xml
+  
+
+  beam-runners-tez
+  2.0.0
+
+  Apache Beam :: Runners :: Tez
+
+  jar
+
+  
+
+
+  org.apache.tez
+  tez-api
+  0.8.4
+
+
+
+  org.apache.hadoop
+  hadoop-client
+  2.7.3
+
+
+  org.apache.hadoop
+  hadoop-common
+  2.7.3
+
+
+
+  org.apache.tez
+  tez-dag
+  0.8.4
+
+
+
+  org.apache.tez
+  tez-mapreduce
+  0.8.4
+
+
+
+  org.apache.tez
+  tez-runtime-library
+  0.8.4
+
+
+
+
+  org.apache.beam
+  beam-sdks-java-core
+  
+
+  org.slf4j
+  slf4j-jdk14
+
+  
+
+
+
+  org.apache.beam
+  beam-runners-core-construction-java
+
+
+
+  org.apache.beam
+  beam-runners-direct-java
+
+
+
+  org.apache.beam
+  beam-runners-core-java
+  
+
+  org.slf4j
+  slf4j-jdk14
+
+  
+
+
+  org.apache.beam
+  beam-sdks-java-harness
+
+
+  junit
+  junit
+  test
+
+  
+
+
\ No newline at end of file
diff --git 
a/runners/tez/src/main/java/org/apache/beam/runners/tez/TezPipelineOptions.java 
b/runners/tez/src/main/java/org/apache/beam/runners/tez/TezPipelineOptions.java
new file mode 100644
index 000..8b37b0941ad
--- /dev/null
+++ 
b/runners/tez/src/main/java/org/apache/beam/runners/tez/TezPipelineOptions.java
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.tez;
+
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/**
+ * Options that configure the Tez pipeline.
+ */
+public interface TezPipelineOptions extends PipelineOptions, 
java.io.Serializable {
+  //TODO: Add options to configure Tez
+}
diff --git 
a/runners/tez/src/main/java/org/apache/beam/runners/tez/TezRunner.java 
b/runners/tez/src/main/java/org/apache/beam/runners/tez/TezRunner.java
new file mode 100644
index 000..7d32b473e92
--- /dev/null
+++ b/runners/tez/src/main/java/org/apache/beam/runners/tez/TezRunner.java
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.tez;
+
+import java.

[GitHub] kennknowles commented on issue #3677: Tez runner

2017-11-27 Thread GitBox
kennknowles commented on issue #3677: Tez runner
URL: https://github.com/apache/beam/pull/3677#issuecomment-347407078
 
 
   Now that we have migrated to gitbox, I can close this. I am going to do so, 
since it is on the actual target branch of `tez-runner`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles closed pull request #4024: [BEAM-2304] Allow declared state to be accessed as a superclass.

2017-11-27 Thread GitBox
kennknowles closed pull request #4024: [BEAM-2304] Allow declared state to be 
accessed as a superclass.
URL: https://github.com/apache/beam/pull/4024
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
index c54c44f2d58..52607833f71 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
@@ -888,8 +888,8 @@ private static Parameter analyzeExtraParameter(
   id);
 
   paramErrors.checkArgument(
-  stateDecl.stateType().equals(stateType),
-  "reference to %s %s with different type %s",
+  stateDecl.stateType().isSubtypeOf(stateType),
+  "data type of reference to %s %s must be a supertype of %s",
   StateId.class.getSimpleName(),
   id,
   formatType(stateDecl.stateType()));
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
index 03e310463f1..b2cc7dca98a 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
@@ -68,6 +68,7 @@
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.state.BagState;
 import org.apache.beam.sdk.state.CombiningState;
+import org.apache.beam.sdk.state.GroupingState;
 import org.apache.beam.sdk.state.MapState;
 import org.apache.beam.sdk.state.ReadableState;
 import org.apache.beam.sdk.state.SetState;
@@ -2532,6 +2533,40 @@ public void processElement(
 pipeline.run();
   }
 
+  @Test
+  @Category({ValidatesRunner.class, UsesStatefulParDo.class})
+  public void testCombiningStateParameterSuperclass() {
+final String stateId = "foo";
+
+DoFn, String> fn =
+new DoFn, String>() {
+  private static final int EXPECTED_SUM = 8;
+
+  @StateId(stateId)
+  private final StateSpec> 
state =
+  StateSpecs.combining(Sum.ofIntegers());
+
+  @ProcessElement
+  public void processElement(ProcessContext c,
+  @StateId(stateId) GroupingState state) {
+state.add(c.element().getValue());
+Integer currentValue = state.read();
+if (currentValue == EXPECTED_SUM) {
+  c.output("right on");
+}
+  }
+};
+
+PCollection output =
+pipeline
+.apply(Create.of(KV.of(123, 4), KV.of(123, 7), KV.of(123, -3)))
+.apply(ParDo.of(fn));
+
+// There should only be one moment at which the sum is exactly 8
+PAssert.that(output).containsInAnyOrder("right on");
+pipeline.run();
+  }
+
   @Test
   @Category({ValidatesRunner.class, UsesStatefulParDo.class})
   public void testBagStateSideInput() {
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java
index 70c8dfdb312..a961203ffed 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/reflect/DoFnSignaturesTest.java
@@ -30,6 +30,8 @@
 import org.apache.beam.sdk.coders.VarIntCoder;
 import org.apache.beam.sdk.coders.VarLongCoder;
 import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.state.CombiningState;
+import org.apache.beam.sdk.state.GroupingState;
 import org.apache.beam.sdk.state.StateSpec;
 import org.apache.beam.sdk.state.StateSpecs;
 import org.apache.beam.sdk.state.TimeDomain;
@@ -39,6 +41,7 @@
 import org.apache.beam.sdk.state.ValueState;
 import org.apache.beam.sdk.state.WatermarkHoldState;
 import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Sum;
 import org.apache.beam.sdk.transforms.reflect.DoFnSignature.Parameter;
 import 
org.apache.beam.sdk.transforms.reflect.DoFnSignature.Parameter.ProcessContextParameter;
 import 
org.apache.beam.sdk.transforms.reflect.DoFnSignature.Parameter.StateParameter;
@@ -649,7 +652,7 @@ public void testStateParameterWrongStateType() throws 
Exception {
 thrown.expect(IllegalArgumentException.class);
 thrown.expectMessage("WatermarkHoldState");
 thrown.expectMessage("reference to");
-thrown.expectMessage("different type");
+thrown.expectMessage("supertype");
 thrown.expectMessage("ValueState");
 thrown.expectMessage("m

[GitHub] kennknowles closed pull request #4153: [BEAM-3219] DataflowRunner: delegate @Setup and @Teardown in stateful ParDo

2017-11-27 Thread GitBox
kennknowles closed pull request #4153: [BEAM-3219] DataflowRunner: delegate 
@Setup and @Teardown in stateful ParDo
URL: https://github.com/apache/beam/pull/4153
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java
index d7e9d06ef4c..6330e79181c 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java
@@ -36,6 +36,7 @@
 import org.apache.beam.sdk.transforms.ParDo;
 import org.apache.beam.sdk.transforms.ParDo.MultiOutput;
 import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.transforms.reflect.DoFnInvokers;
 import org.apache.beam.sdk.transforms.reflect.DoFnSignature;
 import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
 import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
@@ -315,12 +316,22 @@ public void processElement(ProcessContext c) {
   return underlyingDoFn;
 }
 
+@Setup
+public void setup() {
+  DoFnInvokers.invokerFor(underlyingDoFn).invokeSetup();
+}
+
 @ProcessElement
 public void processElement(final ProcessContext c, final BoundedWindow 
window) {
   throw new UnsupportedOperationException(
   "BatchStatefulDoFn.ProcessElement should never be invoked");
 }
 
+@Teardown
+public void teardown() {
+  DoFnInvokers.invokerFor(underlyingDoFn).invokeTeardown();
+}
+
 @Override
 public TypeDescriptor getOutputTypeDescriptor() {
   return underlyingDoFn.getOutputTypeDescriptor();
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java
index 849b874852d..77fea826108 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java
@@ -27,9 +27,14 @@
 
 import java.io.Serializable;
 import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.beam.sdk.state.StateSpec;
+import org.apache.beam.sdk.state.StateSpecs;
+import org.apache.beam.sdk.state.ValueState;
 import org.apache.beam.sdk.testing.NeedsRunner;
 import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.testing.UsesStatefulParDo;
 import org.apache.beam.sdk.testing.ValidatesRunner;
+import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PCollectionList;
 import org.apache.beam.sdk.values.TupleTag;
 import org.apache.beam.sdk.values.TupleTagList;
@@ -151,6 +156,21 @@ public void testFnCallSequenceMulti() {
 p.run();
   }
 
+  @Test
+  @Category({ValidatesRunner.class, UsesStatefulParDo.class})
+  public void testFnCallSequenceStateful() {
+PCollectionList.of(p.apply("Impolite", Create.of(KV.of("a", 1), KV.of("b", 
2), KV.of("a", 4
+.and(
+p.apply(
+"Polite", Create.of(KV.of("b", 3), KV.of("a", 5), KV.of("c", 
6), KV.of("c", 7
+.apply(Flatten.>pCollections())
+.apply(
+ParDo.of(new CallSequenceEnforcingStatefulFn())
+.withOutputTags(new TupleTag>() {}, 
TupleTagList.empty()));
+
+p.run();
+  }
+
   private static class CallSequenceEnforcingFn extends DoFn {
 private boolean setupCalled = false;
 private int startBundleCalls = 0;
@@ -204,25 +224,13 @@ public void after() {
 }
   }
 
-  @Test
-  @Category(NeedsRunner.class)
-  public void testTeardownCalledAfterExceptionInSetup() {
-ExceptionThrowingOldFn fn = new 
ExceptionThrowingOldFn(MethodForException.SETUP);
-p
-.apply(Create.of(1, 2, 3))
-.apply(ParDo.of(fn));
-try {
-  p.run();
-  fail("Pipeline should have failed with an exception");
-} catch (Exception e) {
-  assertThat(
-  "Function should have been torn down after exception",
-  ExceptionThrowingOldFn.teardownCalled.get(),
-  is(true));
-}
-  }
-
+  private static class CallSequenceEnforcingStatefulFn
+  extends CallSequenceEnforcingDoFn> {
+private static final String STATE_ID = "foo";
 
+@StateId(STATE_ID)
+private final StateSpec> valueSpec = StateSpecs.value();
+  }
 
   @Test
   @Category(NeedsRunner.class)


 


This is an automated message from the Apache Git Service.
To respond to the message, pl

[GitHub] pabloem commented on issue #4180: Updating dataflow API version to newer release.

2017-11-27 Thread GitBox
pabloem commented on issue #4180: Updating dataflow API version to newer 
release.
URL: https://github.com/apache/beam/pull/4180#issuecomment-347406119
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pabloem commented on issue #4180: Updating dataflow API version to newer release.

2017-11-27 Thread GitBox
pabloem commented on issue #4180: Updating dataflow API version to newer 
release.
URL: https://github.com/apache/beam/pull/4180#issuecomment-347405675
 
 
   retest this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] pabloem commented on issue #4180: Updating dataflow API version to newer release.

2017-11-27 Thread GitBox
pabloem commented on issue #4180: Updating dataflow API version to newer 
release.
URL: https://github.com/apache/beam/pull/4180#issuecomment-347405675
 
 
   retest this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on issue #4153: [BEAM-3219] DataflowRunner: delegate @Setup and @Teardown in stateful ParDo

2017-11-27 Thread GitBox
kennknowles commented on issue #4153: [BEAM-3219] DataflowRunner: delegate 
@Setup and @Teardown in stateful ParDo
URL: https://github.com/apache/beam/pull/4153#issuecomment-347399933
 
 
   run dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] luke-zhu commented on issue #4176: [BEAM-3143] Type Inference Compatibility with Python 3

2017-11-27 Thread GitBox
luke-zhu commented on issue #4176: [BEAM-3143] Type Inference Compatibility 
with Python 3
URL: https://github.com/apache/beam/pull/4176#issuecomment-347397998
 
 
   After another look, I think that the base of this solution won't improve 
Python 3 compatibility in the long run. I'll submit a more clean PR later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] luke-zhu closed pull request #4176: [BEAM-3143] Type Inference Compatibility with Python 3

2017-11-27 Thread GitBox
luke-zhu closed pull request #4176: [BEAM-3143] Type Inference Compatibility 
with Python 3
URL: https://github.com/apache/beam/pull/4176
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/typehints/__init__.py 
b/sdks/python/apache_beam/typehints/__init__.py
index e89afa1285a..8d89efb21c6 100644
--- a/sdks/python/apache_beam/typehints/__init__.py
+++ b/sdks/python/apache_beam/typehints/__init__.py
@@ -18,5 +18,6 @@
 """A package defining the syntax and decorator semantics for type-hints."""
 
 # pylint: disable=wildcard-import
+from __future__ import absolute_import
 from apache_beam.typehints.typehints import *
 from apache_beam.typehints.decorators import *
diff --git a/sdks/python/apache_beam/typehints/decorators.py 
b/sdks/python/apache_beam/typehints/decorators.py
index 89dc6afa34c..c6792747b99 100644
--- a/sdks/python/apache_beam/typehints/decorators.py
+++ b/sdks/python/apache_beam/typehints/decorators.py
@@ -83,6 +83,7 @@ def foo((a, b)):
 defined, or before importing a module containing type-hinted functions.
 """
 
+from __future__ import absolute_import
 import inspect
 import types
 
@@ -92,6 +93,7 @@ def foo((a, b)):
 from apache_beam.typehints.typehints import SimpleTypeHintError
 from apache_beam.typehints.typehints import check_constraint
 from apache_beam.typehints.typehints import validate_composite_type_param
+from six.moves import zip
 
 __all__ = [
 'with_input_types',
@@ -567,11 +569,13 @@ def __getattr__(self, attr):
   return self.__iter__()
 return getattr(self.internal_gen, attr)
 
-  def next(self):
+  def __next__(self):
 next_val = next(self.internal_gen)
 self.interleave_func(next_val)
 return next_val
 
+  next = __next__
+
   def __iter__(self):
 while True:
   x = next(self.internal_gen)
diff --git a/sdks/python/apache_beam/typehints/disassembly.py 
b/sdks/python/apache_beam/typehints/disassembly.py
new file mode 100644
index 000..506459b98ca
--- /dev/null
+++ b/sdks/python/apache_beam/typehints/disassembly.py
@@ -0,0 +1,230 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""The Python 3 disassembler source code backported to be compatible with
+with Python 2.7.
+"""
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+import collections
+import dis
+import sys
+
+
+def get_instructions(x, first_line=None):
+  """Iterator for the opcodes in methods, functions or code
+  Generates a series of Instruction named tuples giving the details of
+  each operations in the supplied code.
+  If *first_line* is not None, it indicates the line number that should
+  be reported for the first source line in the disassembled code.
+  Otherwise, the source line information (if any) is taken directly from
+  the disassembled code object.
+  """
+  co = _get_code_object(x)
+  cell_names = co.co_cellvars + co.co_freevars
+  linestarts = dict(dis.findlinestarts(co))
+  if first_line is not None:
+line_offset = first_line - co.co_firstlineno
+  else:
+line_offset = 0
+  return _get_instructions_bytes(co.co_code, co.co_varnames, co.co_names,
+ co.co_consts, cell_names, linestarts,
+ line_offset)
+
+
+def _get_code_object(x):
+  """Helper to handle methods, functions, generators, strings
+  and raw code objects"""
+  if hasattr(x, '__func__'):  # Method
+x = x.__func__
+  if hasattr(x, '__code__'):  # Function
+x = x.__code__
+  if hasattr(x, 'gi_code'):  # Generator
+x = x.gi_code
+  if isinstance(x, str):  # Source code
+x = _try_compile(x, "")
+  if hasattr(x, 'co_code'):  # Code object
+return x
+  raise TypeError("don't know how to disassemble %s objects" %
+  type(x).__name__)
+
+
+def _try_compile(source, name):
+  """Attempts to compile the given source, first as an expression and
+ then as a statement if the first approach fails.
+ Utility function to accept strings in functions that otherwise
+ exp

[GitHub] asfgit closed pull request #4182: Add license header to SideInputTranslationTest

2017-11-27 Thread GitBox
asfgit closed pull request #4182: Add license header to SideInputTranslationTest
URL: https://github.com/apache/beam/pull/4182
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
 
b/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
index fd724554371..0de630ac3f6 100644
--- 
a/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
+++ 
b/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
@@ -1,3 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
 package org.apache.beam.runners.apex.translation;
 
 import static org.junit.Assert.assertEquals;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on issue #4182: Add license header to SideInputTranslationTest

2017-11-27 Thread GitBox
kennknowles commented on issue #4182: Add license header to 
SideInputTranslationTest
URL: https://github.com/apache/beam/pull/4182#issuecomment-347395117
 
 
   CC: @jkff @tweise


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles opened a new pull request #4182: Add license header to SideInputTranslationTest

2017-11-27 Thread GitBox
kennknowles opened a new pull request #4182: Add license header to 
SideInputTranslationTest
URL: https://github.com/apache/beam/pull/4182
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on a change in pull request #4153: [BEAM-3219] DataflowRunner: delegate @Setup and @Teardown in stateful ParDo

2017-11-27 Thread GitBox
kennknowles commented on a change in pull request #4153: [BEAM-3219] 
DataflowRunner: delegate @Setup and @Teardown in stateful ParDo
URL: https://github.com/apache/beam/pull/4153#discussion_r153360051
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java
 ##
 @@ -151,6 +156,21 @@ public void testFnCallSequenceMulti() {
 p.run();
   }
 
+  @Test
+  @Category({ValidatesRunner.class, UsesStatefulParDo.class})
+  public void testFnCallSequenceStateful() {
+PCollectionList.of(p.apply("Impolite", Create.of(KV.of("a", 1), KV.of("b", 
2), KV.of("a", 4
+.and(
+p.apply(
+"Polite", Create.of(KV.of("b", 3), KV.of("a", 5), KV.of("c", 
6), KV.of("c", 7
+.apply(Flatten.>pCollections())
 
 Review comment:
   I believe this adds coverage for flatten unzipping. I just carried it over 
from the other test, since this is actually a somewhat disjoint codepath.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on a change in pull request #4153: [BEAM-3219] DataflowRunner: delegate @Setup and @Teardown in stateful ParDo

2017-11-27 Thread GitBox
kennknowles commented on a change in pull request #4153: [BEAM-3219] 
DataflowRunner: delegate @Setup and @Teardown in stateful ParDo
URL: https://github.com/apache/beam/pull/4153#discussion_r153380128
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java
 ##
 @@ -204,25 +224,63 @@ public void after() {
 }
   }
 
-  @Test
-  @Category(NeedsRunner.class)
-  public void testTeardownCalledAfterExceptionInSetup() {
-ExceptionThrowingOldFn fn = new 
ExceptionThrowingOldFn(MethodForException.SETUP);
-p
-.apply(Create.of(1, 2, 3))
-.apply(ParDo.of(fn));
-try {
-  p.run();
-  fail("Pipeline should have failed with an exception");
-} catch (Exception e) {
-  assertThat(
-  "Function should have been torn down after exception",
-  ExceptionThrowingOldFn.teardownCalled.get(),
-  is(true));
+  private static class CallSequenceEnforcingStatefulFn extends 
DoFn, KV> {
 
 Review comment:
   Done. Confirmed that the test still fails without the fix and succeeds with 
the fix, just for good measure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4074: [BEAM-3130] View.asMap() causes a ClassCastException in Apex runner

2017-11-27 Thread GitBox
jkff commented on issue #4074: [BEAM-3130] View.asMap() causes a 
ClassCastException in Apex runner
URL: https://github.com/apache/beam/pull/4074#issuecomment-347389329
 
 
   Fair enough, the current PR is an improvement in any case. Simplified the 
test a little bit and merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] asfgit closed pull request #4074: [BEAM-3130] View.asMap() causes a ClassCastException in Apex runner

2017-11-27 Thread GitBox
asfgit closed pull request #4074: [BEAM-3130] View.asMap() causes a 
ClassCastException in Apex runner
URL: https://github.com/apache/beam/pull/4074
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
index 57d259301b1..0483d761c26 100644
--- a/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
+++ b/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
@@ -100,7 +100,7 @@ public static ApexRunner fromOptions(PipelineOptions 
options) {
   }
 
   @SuppressWarnings({"rawtypes"})
-  private List getOverrides() {
+  protected List getOverrides() {
 return ImmutableList.builder()
 .add(
 PTransformOverride.of(
@@ -110,6 +110,18 @@ public static ApexRunner fromOptions(PipelineOptions 
options) {
 PTransformOverride.of(
 
PTransformMatchers.createViewWithViewFn(PCollectionViews.IterableViewFn.class),
 new StreamingViewAsIterable.Factory()))
+.add(
+PTransformOverride.of(
+
PTransformMatchers.createViewWithViewFn(PCollectionViews.ListViewFn.class),
+new StreamingViewAsIterable.Factory()))
+.add(
+PTransformOverride.of(
+
PTransformMatchers.createViewWithViewFn(PCollectionViews.MapViewFn.class),
+new StreamingViewAsIterable.Factory()))
+.add(
+PTransformOverride.of(
+
PTransformMatchers.createViewWithViewFn(PCollectionViews.MultimapViewFn.class),
+new StreamingViewAsIterable.Factory()))
 .add(
 PTransformOverride.of(
 
PTransformMatchers.createViewWithViewFn(PCollectionViews.SingletonViewFn.class),
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/CoderAdapterStreamCodec.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/CoderAdapterStreamCodec.java
index f6ce1d075b2..196f73e4cb9 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/CoderAdapterStreamCodec.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/utils/CoderAdapterStreamCodec.java
@@ -19,6 +19,7 @@
 
 import com.datatorrent.api.StreamCodec;
 import com.datatorrent.netlet.util.Slice;
+import com.google.common.annotations.VisibleForTesting;
 import java.io.ByteArrayInputStream;
 import java.io.ByteArrayOutputStream;
 import java.io.IOException;
@@ -37,6 +38,11 @@ public CoderAdapterStreamCodec(Coder coder) {
 this.coder = coder;
   }
 
+  @VisibleForTesting
+  public Coder getCoder() {
+return this.coder;
+  }
+
   @Override
   public Object fromByteArray(Slice fragment) {
 ByteArrayInputStream bis = new ByteArrayInputStream(fragment.buffer, 
fragment.offset,
diff --git 
a/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
 
b/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
new file mode 100644
index 000..8e050804f3c
--- /dev/null
+++ 
b/runners/apex/src/test/java/org/apache/beam/runners/apex/translation/SideInputTranslationTest.java
@@ -0,0 +1,155 @@
+package org.apache.beam.runners.apex.translation;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+import com.datatorrent.api.DAG;
+import com.datatorrent.api.DAG.OperatorMeta;
+import com.datatorrent.stram.engine.PortContext;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import org.apache.beam.runners.apex.ApexPipelineOptions;
+import org.apache.beam.runners.apex.TestApexRunner;
+import org.apache.beam.runners.apex.translation.utils.CoderAdapterStreamCodec;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.KvCoder;
+import org.apache.beam.sdk.coders.ListCoder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.coders.StructuredCoder;
+import org.apache.beam.sdk.coders.VarIntCoder;
+import org.apache.beam.sdk.coders.VoidCoder;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.util.WindowedValue.FullWindowedValueCoder;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollectionView;
+import o

[GitHub] chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153376819
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -139,6 +139,24 @@
 
 
 
+
+
+google-cloud-storage
+
+
+filesystem
+GCS
 
 Review comment:
   Will one of the solutions mentioned here work - 
https://stackoverflow.com/questions/10521860/property-autocapitalization-in-maven
 ?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153375847
 
 

 ##
 File path: 
sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 ##
 @@ -100,4 +100,14 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("Google cloud storage - bucket_name/path")
+  String getGcsLocation();
 
 Review comment:
   I think it makes sense to simplify and make 'fileNamePrefix' the full 
prefix. Later if we hit a case where this is inadequate we can revisit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added support for multiple filesystems in TextIO

2017-11-27 Thread GitBox
chamikaramj commented on a change in pull request #4169: [BEAM-3060] Added 
support for multiple filesystems in TextIO
URL: https://github.com/apache/beam/pull/4169#discussion_r153376493
 
 

 ##
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##
 @@ -81,13 +81,38 @@ public static void setup() throws ParseException {
 .as(IOTestPipelineOptions.class);
 
 numberOfTextLines = options.getNumberOfRecords();
-filenamePrefix = appendTimestamp(options.getFilenamePrefix());
+filenamePrefix = resolveProtocolAndPath(options);
   }
 
   private static String appendTimestamp(String filenamePrefix) {
 return String.format("%s_%s", filenamePrefix, new Date().getTime());
   }
 
+  private static String resolveProtocolAndPath(IOTestPipelineOptions options) {
 
 Review comment:
   Yeah, we can have validate that makes sure that fileSystem property and 
fileNamePrefix match.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: SerializableCoder Structured Value

2017-11-27 Thread Kenneth Knowles
What I said is not quite right - there are accidental collisions allowed.
The "all coders" spec for structural value only requires that encode(a) ==
encode(b) implies sv(a).equals(sv(b)). The converse is not required. For
example, the nondeterministic SetCoder can use the Set objects themselves
as structural values, but their encoding may differ. So for determinism it
is actually a.equals(b) implies encode(a) == encode(b) which in turn
implies sv(a).equals(sv(b)). Either way, for deterministic coders they all
coincide.

On Mon, Nov 27, 2017 at 5:23 PM, Kenneth Knowles  wrote:

> To add some flavor,
>
> *All coders:* structuralValue(a).equals(structuralValue(b)) if and only
> if encode(a) == encode(b)
>
> *"Consistent with equals" aka injective:* encode(a) == encode(b) implies
> a.equals(b)
>
> *Deterministic:* a.equals(b) implies 
> structuralValue(a).equals(structuralValue(b))
> (hence encode(a) == encode(b))
>
> The structural value must always be a legitimate substitute for encoding
> to allow in-memory GBK to be faster than encoding.
>
> IMO we should deprecate and retire "consistent with equals" since
> overriding it to return `true` is no simpler than overriding
> structuralValue itself, and it has no purpose other than governing
> structuralValue. The two obvious choices - encoding or return directly -
> are trivial, and getting fancy is optional. The check Luke suggests would
> then just be a test that structuralValue is correct. The mutation detector
> should perhaps just use the structural value and let the coder itself
> decide whether or not it needs to encode.
>
> Also worth considering the dual perspective that highlights portability:
> To a portable runner, the elements are (with a couple exceptions) just
> bytes, and the coders are a way for the SDK to interpret them in order to
> do its computation. The implied spec that the mutation detector relies on
> is that serialize(deserialize(x)) == x for these bytes, so if the
> re-serialized bytes have changed, it assumes the object was mutated. In a
> sense, if an SDK implements "the identity function" yet returns different
> bytes, that is a broken identity function because the bytes *are* the
> element. It is a bit of a strict interpretation, and maybe not so useful
> when the elements are only really interpreted by a single SDK, as in the
> case of SerializableCoder. But I'm not sure what other spec is available.
>
> Kenn
>
>
> On Mon, Nov 27, 2017 at 4:37 PM, Mairbek Khadikov 
> wrote:
>
>> I'm open to renaming *consistentWithEquals*.
>>
>> If I understand the code correctly, when consistentWithEquals returns
>> true, org.apache.beam.sdk.util.MutationDetectors expects
>> *a.equals(deserialize(serialize(a))* which I think is reasonable for
>> SerializableCoder (assuming objects implement equals)*. *Right now,
>> *serialize(a).equals(serialize(deserialize(serialize(a)))* is expected
>> and that contradicts *"does not guarantee a deterministic encoding"*.
>>
>> On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik  wrote:
>>
>>> I think the idea is that SerializableCoder should be updated to expect
>>> that all values it encodes do implement equals() since this seems to be the
>>> much more common case then classes that don't implement a useful equals. It
>>> would be possible to add a useful check to DirectRunner that any value that
>>> says its consistent with equals actually obeys its contract.
>>>
>>> On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov 
>>> wrote:
>>>
 Not sure where you see the contradiction? consistentWithEquals says
 "Whenever the encoded bytes of two values are equal, then the original
 values are equal according to {@code Objects.equals()}." - which is clearly
 false for Serializable's in general: it's possible that serialized form of
 "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
 not implement equals() or if it uses reference equality.

 On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov 
 wrote:

> Hi all,
>
> Currently SerializableCoder#consistentWithEquals returns false, which
> contradicts it's own documentation "{@link SerializableCoder} does not
> guarantee a deterministic encoding, as Java serialization may produce
> different binary encodings for two equivalent objects".
>
> In practice, it leads to failures in org.apache.beam.sdk.util.Mutat
> ionDetectors$CodedValueMutationDetector. For example, some libraries
> cache hash values https://github.com/goog
> le/protobuf/pull/3939/files#diff-24c442a23a43e041e4f50d324638fedbR142
>
> I'm thinking about changing SerializableCoder#consistentWithEquals to
> return true, so that the structural value would be the object itself
> instead of it the serialized bytes. Thoughts?
>
> WIP branch https://github.com/mairbek/beam/tree/ser
>
> Sincerely,
> Mairbek
>
> --
>
> Mairbek Khadikov |  Software Engineer |  mair...@google.co

Re: SerializableCoder Structured Value

2017-11-27 Thread Kenneth Knowles
To add some flavor,

*All coders:* structuralValue(a).equals(structuralValue(b)) if and only if
encode(a) == encode(b)

*"Consistent with equals" aka injective:* encode(a) == encode(b) implies
a.equals(b)

*Deterministic:* a.equals(b) implies
structuralValue(a).equals(structuralValue(b))
(hence encode(a) == encode(b))

The structural value must always be a legitimate substitute for encoding to
allow in-memory GBK to be faster than encoding.

IMO we should deprecate and retire "consistent with equals" since
overriding it to return `true` is no simpler than overriding
structuralValue itself, and it has no purpose other than governing
structuralValue. The two obvious choices - encoding or return directly -
are trivial, and getting fancy is optional. The check Luke suggests would
then just be a test that structuralValue is correct. The mutation detector
should perhaps just use the structural value and let the coder itself
decide whether or not it needs to encode.

Also worth considering the dual perspective that highlights portability: To
a portable runner, the elements are (with a couple exceptions) just bytes,
and the coders are a way for the SDK to interpret them in order to do its
computation. The implied spec that the mutation detector relies on is that
serialize(deserialize(x)) == x for these bytes, so if the re-serialized
bytes have changed, it assumes the object was mutated. In a sense, if an
SDK implements "the identity function" yet returns different bytes, that is
a broken identity function because the bytes *are* the element. It is a bit
of a strict interpretation, and maybe not so useful when the elements are
only really interpreted by a single SDK, as in the case of
SerializableCoder. But I'm not sure what other spec is available.

Kenn


On Mon, Nov 27, 2017 at 4:37 PM, Mairbek Khadikov 
wrote:

> I'm open to renaming *consistentWithEquals*.
>
> If I understand the code correctly, when consistentWithEquals returns
> true, org.apache.beam.sdk.util.MutationDetectors expects
> *a.equals(deserialize(serialize(a))* which I think is reasonable for
> SerializableCoder (assuming objects implement equals)*. *Right now,
> *serialize(a).equals(serialize(deserialize(serialize(a)))* is expected
> and that contradicts *"does not guarantee a deterministic encoding"*.
>
> On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik  wrote:
>
>> I think the idea is that SerializableCoder should be updated to expect
>> that all values it encodes do implement equals() since this seems to be the
>> much more common case then classes that don't implement a useful equals. It
>> would be possible to add a useful check to DirectRunner that any value that
>> says its consistent with equals actually obeys its contract.
>>
>> On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov 
>> wrote:
>>
>>> Not sure where you see the contradiction? consistentWithEquals says
>>> "Whenever the encoded bytes of two values are equal, then the original
>>> values are equal according to {@code Objects.equals()}." - which is clearly
>>> false for Serializable's in general: it's possible that serialized form of
>>> "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
>>> not implement equals() or if it uses reference equality.
>>>
>>> On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov 
>>> wrote:
>>>
 Hi all,

 Currently SerializableCoder#consistentWithEquals returns false, which
 contradicts it's own documentation "{@link SerializableCoder} does not
 guarantee a deterministic encoding, as Java serialization may produce
 different binary encodings for two equivalent objects".

 In practice, it leads to failures in org.apache.beam.sdk.util.Mutat
 ionDetectors$CodedValueMutationDetector. For example, some libraries
 cache hash values https://github.com/google/protobuf/pull/3939/files#di
 ff-24c442a23a43e041e4f50d324638fedbR142

 I'm thinking about changing SerializableCoder#consistentWithEquals to
 return true, so that the structural value would be the object itself
 instead of it the serialized bytes. Thoughts?

 WIP branch https://github.com/mairbek/beam/tree/ser

 Sincerely,
 Mairbek

 --

 Mairbek Khadikov |  Software Engineer |  mair...@google.com


>>
>
>
> --
>
> Mairbek Khadikov |  Software Engineer |  mair...@google.com
>
>


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347381845
 
 
   Run Python Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347381795
 
 
   Run Python Performance Test
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347381328
 
 
   run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] robertwb closed pull request #4181: Fix typo in proto files.

2017-11-27 Thread GitBox
robertwb closed pull request #4181: Fix typo in proto files.
URL: https://github.com/apache/beam/pull/4181
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347378789
 
 
   Run Python Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347376775
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] herohde commented on issue #4181: Fix typo in proto files.

2017-11-27 Thread GitBox
herohde commented on issue #4181: Fix typo in proto files.
URL: https://github.com/apache/beam/pull/4181#issuecomment-347376582
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153365677
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +201,49 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link CombineFn} that combines into a {@link List} of up to limit 
elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyCombineFn extends CombineFn, 
Iterable> {
+private final long limit;
 
-public SampleAnyDoFn(long limit, PCollectionView> 
iterableView) {
+private SampleAnyCombineFn(long limit) {
   this.limit = limit;
-  this.iterableView = iterableView;
 }
 
-@ProcessElement
-public void processElement(ProcessContext c) {
-  for (T i : c.sideInput(iterableView)) {
-if (limit-- <= 0) {
-  break;
+@Override
+public List createAccumulator() {
+  return new ArrayList<>();
+}
+
+@Override
+public List addInput(List accumulator, T input) {
+  if (accumulator.size() < limit) {
+accumulator.add(input);
+  }
+  return accumulator;
+}
+
+@Override
+public List mergeAccumulators(Iterable> accumulators) {
+  List merged = new ArrayList<>();
+  for (List accumulator : accumulators) {
+for (T t : accumulator) {
+  merged.add(t);
+  if (merged.size() >= limit) {
+return merged;
+  }
 }
-c.output(i);
   }
+  return merged;
+}
+
+@Override
+public Iterable extractOutput(List accumulator) {
+  return accumulator;
 }
   }
 
-  /**
+/**
 
 Review comment:
   Accidental change


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: SerializableCoder Structured Value

2017-11-27 Thread Mairbek Khadikov
I'm open to renaming *consistentWithEquals*.

If I understand the code correctly, when consistentWithEquals returns
true, org.apache.beam.sdk.util.MutationDetectors
expects *a.equals(deserialize(serialize(a))* which I think is reasonable
for SerializableCoder (assuming objects implement equals)*. *Right now,
*serialize(a).equals(serialize(deserialize(serialize(a)))* is expected and
that contradicts *"does not guarantee a deterministic encoding"*.

On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik  wrote:

> I think the idea is that SerializableCoder should be updated to expect
> that all values it encodes do implement equals() since this seems to be the
> much more common case then classes that don't implement a useful equals. It
> would be possible to add a useful check to DirectRunner that any value that
> says its consistent with equals actually obeys its contract.
>
> On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov 
> wrote:
>
>> Not sure where you see the contradiction? consistentWithEquals says
>> "Whenever the encoded bytes of two values are equal, then the original
>> values are equal according to {@code Objects.equals()}." - which is clearly
>> false for Serializable's in general: it's possible that serialized form of
>> "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
>> not implement equals() or if it uses reference equality.
>>
>> On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov 
>> wrote:
>>
>>> Hi all,
>>>
>>> Currently SerializableCoder#consistentWithEquals returns false, which
>>> contradicts it's own documentation "{@link SerializableCoder} does not
>>> guarantee a deterministic encoding, as Java serialization may produce
>>> different binary encodings for two equivalent objects".
>>>
>>> In practice, it leads to failures in org.apache.beam.sdk.util.Mutat
>>> ionDetectors$CodedValueMutationDetector. For example, some libraries
>>> cache hash values https://github.com/google/protobuf/pull/3939/files#
>>> diff-24c442a23a43e041e4f50d324638fedbR142
>>>
>>> I'm thinking about changing SerializableCoder#consistentWithEquals to
>>> return true, so that the structural value would be the object itself
>>> instead of it the serialized bytes. Thoughts?
>>>
>>> WIP branch https://github.com/mairbek/beam/tree/ser
>>>
>>> Sincerely,
>>> Mairbek
>>>
>>> --
>>>
>>> Mairbek Khadikov |  Software Engineer |  mair...@google.com
>>>
>>>
>


-- 

Mairbek Khadikov |  Software Engineer |  mair...@google.com


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347374422
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347372780
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] robertwb commented on issue #4181: Fix typo in proto files.

2017-11-27 Thread GitBox
robertwb commented on issue #4181: Fix typo in proto files.
URL: https://github.com/apache/beam/pull/4181#issuecomment-347372394
 
 
   R: @herohde


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] robertwb opened a new pull request #4181: Fix typo in proto files.

2017-11-27 Thread GitBox
robertwb opened a new pull request #4181: Fix typo in proto files.
URL: https://github.com/apache/beam/pull/4181
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [Build] Enforce failed on project beam-sdks-java-io-tika

2017-11-27 Thread Manu Zhang
Thanks, the debug information pointed me to the corrupted geoapi-3.0.0.jar
file.
Removing it solves the problem for me.

Manu


On Tue, Nov 28, 2017 at 4:41 AM Lukasz Cwik 
wrote:

> Try running with -X to get all the debug output when using Maven. It may
> give more details.
>
> Alternatively, try using the gradle build (from root of project directory):
> ./gradlew build
>
>
>
> On Fri, Nov 24, 2017 at 9:46 PM, Manu Zhang 
> wrote:
>
> > Hi all,
> >
> > Has anyone seen this issue when building latest master on Mac ?
> >
> > *[ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce)
> > on project beam-sdks-java-io-tika: Execution enforce of goal
> > org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce failed.
> > NullPointerException -> [Help 1]*
> >
> > Thanks,
> > Manu
> >
>


Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-27 Thread Lukasz Cwik
I have confirmed that I am no longer impacted by the .INVALID suffix.

On Mon, Nov 27, 2017 at 3:06 PM, Lukasz Cwik  wrote:

> Apache infra has mentioned that they have updated the mailing list to
> prevent the ".INVALID" from appearing on the end. dev@ now matches user@
> in this regard.
>
> On Mon, Nov 27, 2017 at 11:30 AM, Lukasz Cwik  wrote:
>
>> Vote Closed:
>> all +1s with at least 3 binding votes.
>>
>> On Fri, Nov 24, 2017 at 6:52 AM, Aljoscha Krettek 
>> wrote:
>>
>>> +1
>>>
>>> > On 23. Nov 2017, at 23:22, Manu Zhang  wrote:
>>> >
>>> > +1
>>> >
>>> > On Thu, Nov 23, 2017 at 11:32 PM Maximilian Michels 
>>> wrote:
>>> >
>>> >> +1
>>> >>
>>> >> Thanks for looking into it!
>>> >>
>>> >> On 23.11.17 00:25, Lukasz Cwik wrote:
>>> >>> I have noticed that some e-mail addresses (notably @google.com) get
>>> >>> .INVALID suffixed onto it so per...@yyy.com become
>>> >> per...@yyy.com.INVALID
>>> >>> in the From: header.
>>> >>>
>>> >>> I have figured out that this is an issue with the way that our mail
>>> >> server
>>> >>> is configured and opened
>>> >> https://issues.apache.org/jira/browse/INFRA-15529.
>>> >>>
>>> >>> For those of us that are impacted, it makes it more difficult for
>>> users
>>> >> to
>>> >>> reply directly to the originator.
>>> >>>
>>> >>> Infra has asked to get consensus from PMC members before making the
>>> >> change
>>> >>> which I figured it would be easiest with a vote.
>>> >>>
>>> >>> Please vote:
>>> >>> +1 Update mail server to stop suffixing .INVALID
>>> >>> -1 Don't change mail server settings.
>>> >>>
>>> >>
>>>
>>>
>>
>


Re: SerializableCoder Structured Value

2017-11-27 Thread Lukasz Cwik
I think the idea is that SerializableCoder should be updated to expect that
all values it encodes do implement equals() since this seems to be the much
more common case then classes that don't implement a useful equals. It
would be possible to add a useful check to DirectRunner that any value that
says its consistent with equals actually obeys its contract.

On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov 
wrote:

> Not sure where you see the contradiction? consistentWithEquals says
> "Whenever the encoded bytes of two values are equal, then the original
> values are equal according to {@code Objects.equals()}." - which is clearly
> false for Serializable's in general: it's possible that serialized form of
> "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
> not implement equals() or if it uses reference equality.
>
> On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov 
> wrote:
>
>> Hi all,
>>
>> Currently SerializableCoder#consistentWithEquals returns false, which
>> contradicts it's own documentation "{@link SerializableCoder} does not
>> guarantee a deterministic encoding, as Java serialization may produce
>> different binary encodings for two equivalent objects".
>>
>> In practice, it leads to failures in org.apache.beam.sdk.util.
>> MutationDetectors$CodedValueMutationDetector. For example, some
>> libraries cache hash values https://github.com/google/protobuf/pull/3939/
>> files#diff-24c442a23a43e041e4f50d324638fedbR142
>>
>> I'm thinking about changing SerializableCoder#consistentWithEquals to
>> return true, so that the structural value would be the object itself
>> instead of it the serialized bytes. Thoughts?
>>
>> WIP branch https://github.com/mairbek/beam/tree/ser
>>
>> Sincerely,
>> Mairbek
>>
>> --
>>
>> Mairbek Khadikov |  Software Engineer |  mair...@google.com
>>
>>


Re: SerializableCoder Structured Value

2017-11-27 Thread Eugene Kirpichov
Not sure where you see the contradiction? consistentWithEquals says
"Whenever the encoded bytes of two values are equal, then the original
values are equal according to {@code Objects.equals()}." - which is clearly
false for Serializable's in general: it's possible that serialized form of
"a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
not implement equals() or if it uses reference equality.

On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov  wrote:

> Hi all,
>
> Currently SerializableCoder#consistentWithEquals returns false, which
> contradicts it's own documentation "{@link SerializableCoder} does not
> guarantee a deterministic encoding, as Java serialization may produce
> different binary encodings for two equivalent objects".
>
> In practice, it leads to failures in
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector. For
> example, some libraries cache hash values
> https://github.com/google/protobuf/pull/3939/files#diff-24c442a23a43e041e4f50d324638fedbR142
>
> I'm thinking about changing SerializableCoder#consistentWithEquals to
> return true, so that the structural value would be the object itself
> instead of it the serialized bytes. Thoughts?
>
> WIP branch https://github.com/mairbek/beam/tree/ser
>
> Sincerely,
> Mairbek
>
> --
>
> Mairbek Khadikov |  Software Engineer |  mair...@google.com
>
>


SerializableCoder Structured Value

2017-11-27 Thread Mairbek Khadikov
Hi all,

Currently SerializableCoder#consistentWithEquals returns false, which
contradicts it's own documentation "{@link SerializableCoder} does not
guarantee a deterministic encoding, as Java serialization may produce
different binary encodings for two equivalent objects".

In practice, it leads to failures in org.apache.beam.sdk.util.
MutationDetectors$CodedValueMutationDetector. For example, some libraries
cache hash values https://github.com/google/protobuf/pull/3939/files#diff-
24c442a23a43e041e4f50d324638fedbR142

I'm thinking about changing SerializableCoder#consistentWithEquals to
return true, so that the structural value would be the object itself
instead of it the serialized bytes. Thoughts?

WIP branch https://github.com/mairbek/beam/tree/ser

Sincerely,
Mairbek

-- 

Mairbek Khadikov |  Software Engineer |  mair...@google.com


[GitHub] kennknowles commented on issue #4135: [BEAM-3194] Add @RequiresStableInput annotation

2017-11-27 Thread GitBox
kennknowles commented on issue #4135: [BEAM-3194] Add @RequiresStableInput 
annotation
URL: https://github.com/apache/beam/pull/4135#issuecomment-347367940
 
 
   Thanks! Squashed in the fixup and re-running tests since there looked like a 
failure that I can't imagine this PR caused.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347364157
 
 
   run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kennknowles commented on issue #4105: [BEAM-2899] Fork FnDataService from runners-core

2017-11-27 Thread GitBox
kennknowles commented on issue #4105: [BEAM-2899]  Fork FnDataService from 
runners-core
URL: https://github.com/apache/beam/pull/4105#issuecomment-347361536
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347359996
 
 
   Run Python Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347359648
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tweise commented on issue #4074: [BEAM-3130] View.asMap() causes a ClassCastException in Apex runner

2017-11-27 Thread GitBox
tweise commented on issue #4074: [BEAM-3130] View.asMap() causes a 
ClassCastException in Apex runner
URL: https://github.com/apache/beam/pull/4074#issuecomment-347359334
 
 
   @jkff I'm aware of that issue as well. It is certainly important, but at the 
same time it's not new and I don't see why it needs to be resolved in order for 
this change to be merged. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-27 Thread Lukasz Cwik
Apache infra has mentioned that they have updated the mailing list to
prevent the ".INVALID" from appearing on the end. dev@ now matches user@ in
this regard.

On Mon, Nov 27, 2017 at 11:30 AM, Lukasz Cwik  wrote:

> Vote Closed:
> all +1s with at least 3 binding votes.
>
> On Fri, Nov 24, 2017 at 6:52 AM, Aljoscha Krettek 
> wrote:
>
>> +1
>>
>> > On 23. Nov 2017, at 23:22, Manu Zhang  wrote:
>> >
>> > +1
>> >
>> > On Thu, Nov 23, 2017 at 11:32 PM Maximilian Michels 
>> wrote:
>> >
>> >> +1
>> >>
>> >> Thanks for looking into it!
>> >>
>> >> On 23.11.17 00:25, Lukasz Cwik wrote:
>> >>> I have noticed that some e-mail addresses (notably @google.com) get
>> >>> .INVALID suffixed onto it so per...@yyy.com become
>> >> per...@yyy.com.INVALID
>> >>> in the From: header.
>> >>>
>> >>> I have figured out that this is an issue with the way that our mail
>> >> server
>> >>> is configured and opened
>> >> https://issues.apache.org/jira/browse/INFRA-15529.
>> >>>
>> >>> For those of us that are impacted, it makes it more difficult for
>> users
>> >> to
>> >>> reply directly to the originator.
>> >>>
>> >>> Infra has asked to get consensus from PMC members before making the
>> >> change
>> >>> which I figured it would be easiest with a vote.
>> >>>
>> >>> Please vote:
>> >>> +1 Update mail server to stop suffixing .INVALID
>> >>> -1 Don't change mail server settings.
>> >>>
>> >>
>>
>>
>


[GitHub] tgroh closed pull request #4127: [BEAM-2899] Add a Logging Service to FnExecution

2017-11-27 Thread GitBox
tgroh closed pull request #4127: [BEAM-2899] Add a Logging Service to 
FnExecution
URL: https://github.com/apache/beam/pull/4127
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/model/fn-execution/src/main/proto/beam_fn_api.proto 
b/model/fn-execution/src/main/proto/beam_fn_api.proto
index 132d366f707..a2d0eb47942 100644
--- a/model/fn-execution/src/main/proto/beam_fn_api.proto
+++ b/model/fn-execution/src/main/proto/beam_fn_api.proto
@@ -636,9 +636,9 @@ message LogEntry {
   // free not to use all severity levels in their log messages.
   message Severity {
 enum Enum {
+  // Unspecified level information. Will be logged at the TRACE level.
   UNSPECIFIED = 0;
-  // Trace level information, also the default log level unless
-  // another severity is specified.
+  // Trace level information.
   TRACE = 1;
   // Debugging information.
   DEBUG = 2;
diff --git a/runners/java-fn-execution/pom.xml 
b/runners/java-fn-execution/pom.xml
index 3ebcfd0c8d9..f275d69207e 100644
--- a/runners/java-fn-execution/pom.xml
+++ b/runners/java-fn-execution/pom.xml
@@ -79,6 +79,11 @@
   provided
 
 
+
+  org.slf4j
+  slf4j-api
+
+
 
 
   junit
@@ -98,9 +103,11 @@
   test-jar
   test
 
-  
-  org.mockito
-  mockito-all
-  
+
+
+  org.mockito
+  mockito-all
+  test
+
   
 
diff --git 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java
 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java
index 9ea0fce4636..547475c4be6 100644
--- 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java
+++ 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/FnService.java
@@ -21,4 +21,15 @@
 import io.grpc.BindableService;
 
 /** An interface sharing common behavior with services used during execution 
of user Fns. */
-public interface FnService extends AutoCloseable, BindableService {}
+public interface FnService extends AutoCloseable, BindableService {
+  /**
+   * {@inheritDoc}.
+   *
+   * There should be no more calls to any service method by the time a call 
to {@link #close()}
+   * begins. Specifically, this means that a {@link io.grpc.Server} that this 
service is bound to
+   * should have completed a call to the {@link io.grpc.Server#shutdown()} 
method, and all future
+   * incoming calls will be rejected.
+   */
+  @Override
+  void close() throws Exception;
+}
diff --git 
a/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java
 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java
new file mode 100644
index 000..9fe4a5fa07c
--- /dev/null
+++ 
b/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/InProcessServerFactory.java
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.fnexecution;
+
+import io.grpc.BindableService;
+import io.grpc.Server;
+import io.grpc.inprocess.InProcessServerBuilder;
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.beam.model.pipeline.v1.Endpoints.ApiServiceDescriptor;
+
+/**
+ * A {@link ServerFactory} which creates {@link Server servers} with the {@link
+ * InProcessServerBuilder}.
+ */
+public class InProcessServerFactory extends ServerFactory {
+  private static final AtomicInteger serviceNameUniqifier = new 
AtomicInteger();
+
+  public static InProcessServerFactory create() {
+return new InProcessServerFactory();
+  }
+
+  private InProcessServerFactory() {}
+
+  @Override
+  public Server allocatePortAndCreate(BindableService service, 
ApiServiceDescriptor.Builder builder)
+  throws IOException {
+String name = String.format("InProcessServer_%s", 
serviceNameUniqifier.getAndIncrement());
+   

[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153350918
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tgroh closed pull request #4151: [BEAM-2899] Decompose Direct Execution Components

2017-11-27 Thread GitBox
tgroh closed pull request #4151: [BEAM-2899] Decompose Direct Execution 
Components
URL: https://github.com/apache/beam/pull/4151
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BundleProcessor.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BundleProcessor.java
new file mode 100644
index 000..59d3043450a
--- /dev/null
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BundleProcessor.java
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.runners.direct;
+
+import org.apache.beam.runners.local.Bundle;
+
+/**
+ * An executor that is capable of processing some bundle of input over some 
executable stage or
+ * step.
+ */
+interface BundleProcessor, ExecutableT> {
+  /**
+   * Execute the provided bundle using the provided Executable, calling back 
to the {@link
+   * CompletionCallback} when execution completes.
+   */
+  void process(BundleT bundle, ExecutableT consumer, CompletionCallback 
onComplete);
+}
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
index d041a5a84ae..89f75a5e12e 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
@@ -188,16 +188,14 @@ public DirectPipelineResult run(Pipeline 
originalPipeline) {
 graph,
 keyedPValueVisitor.getKeyedPValues());
 
-RootProviderRegistry rootInputProvider = 
RootProviderRegistry.defaultRegistry(context);
 TransformEvaluatorRegistry registry = 
TransformEvaluatorRegistry.defaultRegistry(context);
 PipelineExecutor executor =
 ExecutorServiceParallelExecutor.create(
-options.getTargetParallelism(), graph,
-rootInputProvider,
+options.getTargetParallelism(),
 registry,
 Enforcement.defaultModelEnforcements(enabledEnforcements),
 context);
-executor.start(graph.getRootTransforms());
+executor.start(graph, RootProviderRegistry.defaultRegistry(context));
 
 DirectPipelineResult result = new DirectPipelineResult(executor, context);
 if (options.isBlockOnRun()) {
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
index fe3765b8d2b..1650bb67ecd 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
@@ -17,45 +17,32 @@
  */
 package org.apache.beam.runners.direct;
 
-import static com.google.common.base.Preconditions.checkState;
-
-import com.google.auto.value.AutoValue;
 import com.google.common.base.Optional;
 import com.google.common.cache.CacheBuilder;
 import com.google.common.cache.CacheLoader;
 import com.google.common.cache.LoadingCache;
 import com.google.common.cache.RemovalListener;
 import com.google.common.cache.RemovalNotification;
-import com.google.common.collect.Iterables;
+import com.google.common.collect.ImmutableMap;
 import com.google.common.util.concurrent.MoreExecutors;
 import com.google.common.util.concurrent.ThreadFactoryBuilder;
-import java.util.ArrayList;
 import java.util.Collection;
-import java.util.Collections;
 import java.util.Map;
-import java.util.Queue;
 import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentLinkedQueue;
-import java.util.concurrent.ConcurrentMap;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.con

[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347357063
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347356163
 
 
   run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153348845
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   OK, so then I'd like again to raise the suggestion to remove this DoFn :) 
Combining each window into n or fewer elements is the right semantics.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153348529
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Oh sorry, I looked in the wrong place in the code. Yeah, this implementation 
is wrong: the main element (null in this case) is in the global window, so of 
course it's not allowed to access a side input that is windowed non-globally, 
because it's ambiguous which window should be accessed. It plays even more 
poorly with triggering. See 
https://beam.apache.org/documentation/programming-guide/#side-inputs-and-windowing
 . All the more reason to fix this!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347354797
 
 
   run seed job
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4074: [BEAM-3130] View.asMap() causes a ClassCastException in Apex runner

2017-11-27 Thread GitBox
jkff commented on issue #4074: [BEAM-3130] View.asMap() causes a 
ClassCastException in Apex runner
URL: https://github.com/apache/beam/pull/4074#issuecomment-347351587
 
 
   Thank you! While trying to merge this, I noticed a much more serious issue 
that I believe should be resolved before this one: 
https://issues.apache.org/jira/browse/BEAM-3261 - could you take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153343674
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   There's no combine in the current implementation though. It looks like this:
   
   ```java
   @Override
   public PCollection expand(PCollection in) {
 PCollectionView> iterableView = 
in.apply(View.asIterable());
 return in.getPipeline()
 .apply(Create.of((Void) null).withCoder(VoidCoder.of()))
 .apply(ParDo.of(new SampleAnyDoFn<>(limit, 
iterableView)).withSideInputs(iterableView))
 .setCoder(in.getCoder());
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [discuss] java profile

2017-11-27 Thread Lukasz Cwik
Manually whitelisting/blacklisting sub-modules is error prone since it
hides issues due to incorrectly maintaining that list is the same argument
as if the build process doesn't correctly invoke an incremental build
process.

On Mon, Nov 27, 2017 at 1:45 PM, Romain Manni-Bucau 
wrote:

> Well for validation builds- pre PR, incremental support is pointless since
> it easily hides issues die to caching so a solution saving half of the
> build without loosing anuyhing would still be good IMHO.
>
> Le 27 nov. 2017 21:12, "Lukasz Cwik"  a écrit :
>
> > Incremental builds aren't correctly setup right now so your likely to see
> > Python/Go rebuild even if there were no changes. See
> > https://issues.apache.org/jira/browse/BEAM-3253
> >
> > On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau <
> > rmannibu...@gmail.com>
> > wrote:
> >
> > > that was the goal: validate there was no side effect of the changes on
> > > the whole project. Now the "not java" part of the build will not be
> > > impacted by java changed so this is the part i want to skip since it
> > > takes a lot of time and I have guarantees it is safe to skip them.
> > >
> > > Romain Manni-Bucau
> > > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >
> > >
> > > 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
> > > > Romain, that will build the entire project. I think you want to
> execute
> > > > (from the root of the project):
> > > > ./gradlew :beam-sdks-parent:beam-sdks-python:build
> > > >
> > > > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
> > > rmannibu...@gmail.com>
> > > > wrote:
> > > >
> > > >> gradle build --no-daemon
> > > >>
> > > >> (with gradle 4.2)
> > > >>
> > > >> Romain Manni-Bucau
> > > >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > >>
> > > >>
> > > >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles  >:
> > > >> > What is the gradle command you are using to build just the Python
> > SDK?
> > > >> >
> > > >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
> > > >> rmannibu...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> >> Hmm,
> > > >> >>
> > > >> >> issue is the same with gradle (locally python build takes 15mn
> > alone
> > > >> >> which is as much as the java build and it is not parallelized I
> > > think)
> > > >> >>
> > > >> >> pl is not as smooth since it means doing it on each command
> whereas
> > > >> >> the proposal is automatically activated through settings.xml
> > > >> >>
> > > >> >> Romain Manni-Bucau
> > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > >> >>
> > > >> >>
> > > >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles
>  > >:
> > > >> >> > I think you can already mostly do this with mvn -pl sdks/XYZ
> -am
> > > >> -amd. I
> > > >> >> > think that we have other work (gradle support) underway that
> will
> > > make
> > > >> >> this
> > > >> >> > a non-issue since gradle automatically does even better than
> the
> > > >> profile
> > > >> >> or
> > > >> >> > -am -amd.
> > > >> >> >
> > > >> >> > On Mon, Nov 27, 2017 at 11:01 AM, Romain Manni-Bucau <
> > > >> >> rmannibu...@gmail.com>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> >> Hi guys,
> > > >> >> >>
> > > >> >> >> java/python/go/xxx support is great but as a developer you
> > rarely
> > > >> hack
> > > >> >> >> on them all.
> > > >> >> >>
> > > >> >> >> For that reason I opened https://github.com/apache/
> > beam/pull/4173
> > > .
> > > >> >> >>
> > > >> >> >> Goal is to give each developer a way to build the whole
> project
> > > and
> > > >> >> >> all the code he can impact at once but without caring of the
> > code
> > > he
> > > >> >> >> doesn't modify at all - other languages.
> > > >> >> >>
> > > >> >> >> Wdyt?
> > > >> >> >>
> > > >> >> >> Romain Manni-Bucau
> > > >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
>


[GitHub] chamikaramj closed pull request #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-27 Thread GitBox
chamikaramj closed pull request #4149: [BEAM-3060] Add Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
 
b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
index 91b3aa6d344..5a29d4f8126 100644
--- 
a/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
+++ 
b/sdks/java/io/common/src/test/java/org/apache/beam/sdk/io/common/IOTestPipelineOptions.java
@@ -100,4 +100,10 @@
   String getFilenamePrefix();
 
   void setFilenamePrefix(String prefix);
+
+  @Description("File compression type for writing and reading test files")
+  @Default.String("UNCOMPRESSED")
+  String getCompressionType();
+
+  void setCompressionType(String compressionType);
 }
diff --git 
a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 
b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
index ecab1d86497..1b9b385a1ff 100644
--- 
a/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
+++ 
b/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
@@ -15,18 +15,24 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+
 package org.apache.beam.sdk.io.text;
 
+import static org.apache.beam.sdk.io.Compression.AUTO;
+
 import com.google.common.base.Function;
 import com.google.common.collect.FluentIterable;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Iterables;
+
 import java.io.IOException;
 import java.text.ParseException;
 import java.util.Collection;
 import java.util.Collections;
 import java.util.Date;
 import java.util.Map;
+
+import org.apache.beam.sdk.io.Compression;
 import org.apache.beam.sdk.io.FileSystems;
 import org.apache.beam.sdk.io.GenerateSequence;
 import org.apache.beam.sdk.io.TextIO;
@@ -50,13 +56,16 @@
 import org.junit.runners.JUnit4;
 
 /**
- * An integration test for {@link org.apache.beam.sdk.io.TextIO}.
+ * Integration tests for {@link org.apache.beam.sdk.io.TextIO}.
  *
  * Run this test using the command below. Pass in connection information 
via PipelineOptions:
  * 
- *  mvn -e -Pio-it verify -pl sdks/java/io/text 
-DintegrationTestPipelineOptions='[
+ *  mvn -e -Pio-it verify -pl sdks/java/io/file-based-io-tests
+ *  -Dit.test=org.apache.beam.sdk.io.text.TextIOIT
+ *  -DintegrationTestPipelineOptions='[
  *  "--numberOfRecords=10",
  *  "--filenamePrefix=TEXTIOIT"
+ *  "--compressionType=GZIP"
  *  ]'
  * 
  * */
@@ -65,6 +74,7 @@
 
   private static String filenamePrefix;
   private static Long numberOfTextLines;
+  private static Compression compressionType;
 
   @Rule
   public TestPipeline pipeline = TestPipeline.create();
@@ -77,6 +87,16 @@ public static void setup() throws ParseException {
 
 numberOfTextLines = options.getNumberOfRecords();
 filenamePrefix = appendTimestamp(options.getFilenamePrefix());
+compressionType = parseCompressionType(options.getCompressionType());
+  }
+
+  private static Compression parseCompressionType(String compressionType) {
+try {
+  return Compression.valueOf(compressionType.toUpperCase());
+} catch (IllegalArgumentException ex) {
+  throw new IllegalArgumentException(
+  String.format("Unsupported compression type: %s", compressionType));
+}
   }
 
   private static String appendTimestamp(String filenamePrefix) {
@@ -85,14 +105,20 @@ private static String appendTimestamp(String 
filenamePrefix) {
 
   @Test
   public void writeThenReadAll() {
+TextIO.TypedWrite write = TextIO
+.write()
+.to(filenamePrefix)
+.withOutputFilenames()
+.withCompression(compressionType);
+
 PCollection testFilenames = pipeline
 .apply("Generate sequence", 
GenerateSequence.from(0).to(numberOfTextLines))
 .apply("Produce text lines", ParDo.of(new 
DeterministicallyConstructTestTextLineFn()))
-.apply("Write content to files", 
TextIO.write().to(filenamePrefix).withOutputFilenames())
+.apply("Write content to files", write)
 .getPerDestinationOutputFilenames().apply(Values.create());
 
 PCollection consolidatedHashcode = testFilenames
-.apply("Read all files", TextIO.readAll())
+.apply("Read all files", TextIO.readAll().withCompression(AUTO))
 .apply("Calculate hashcode", Combine.globally(new HashingFn()));
 
 String expectedHash = getExpectedHashForLineCount(numberOfTextLines);
@@ -107,7 +133,8 @@ public void writeThenReadAll() 

[GitHub] chamikaramj commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT

2017-11-27 Thread GitBox
chamikaramj commented on issue #4149: [BEAM-3060] Add Compressed TextIOIT
URL: https://github.com/apache/beam/pull/4149#issuecomment-347345709
 
 
   Merged. Closing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Eugene Kirpichov
On Mon, Nov 27, 2017 at 11:52 AM Romain Manni-Bucau 
wrote:

> 2017-11-27 20:26 GMT+01:00 Lukasz Cwik :
> > Romain, as mentioned earlier, I identified that Maven was slower because
> it
> > needed to finish building the entire module before dependent modules
> could
> > start which included running tests, performing checkstyle, etc...
> > Gradle is able to increase the parallelism of the build process since it
> > has task driven parallelism so as long as the files are compiled, the
> > dependent projects can start.
>
> This means we can implement a maven graph builder which is better than
> the default one - surely with a thread safe local repo - and
> contribute it back to solve it durably.
>

That would be indeed a wonderful contribution to the Java ecosystem as a
whole, but I'd like to reiterate that Maven has existed for very long, I'm
sure a lot of people have noticed its poor parallelization in the
meanwhile, and still nobody has even tried doing this AFAIK.

So it seems unlikely that this is doable in a reasonable timeframe, and I
don't think that the possibility of this is a strong argument against
switching to Gradle.


>
> >
> > Maven and Gradle are both heavily used since there are ~146k Maven
> projects
> > on Github while there are ~122k Gradle project on Github. Do you have
> data
> > which shows that Maven is significantly more "mainstream"?
>
> Yep, project i worked on in companies using gradle: 0, all were based
> on maven and maven was "tool-ed" versus gradle was "best effort" in
> term of plugins.
> Now  - with my EE background - I can guarantee you gradle is not able
> to handle properly its build since it flatten the classpath and
> plugins conflicts very quickly (their plugin dependency feature never
> worked and almost no plugin impl it correctly).
>
> Wonder if it is easy to have the ASF stats, anyone knows?
>
> >
> > I believe we want a rich multi-language SDK and community and feel as
> > though it would be unwise to treat non JDK based languages as second
> class.
>
> Hmm, not sure how it is related to the build tool since Maven and
> Gradle have the same level of support - actually surprsingly maven is
> better for js and surely as bad as gradle for others - or here again
> we can create plugin like the frontend-maven-plugin if needed for
> other languages.
>
> That said it can be an interesting other thread since people consuming
> these languages will probably want their mainstream build tool and a
> "standard" repository layout rather than a java one. But this is
> harder to measure.
>
> >
> > On Mon, Nov 27, 2017 at 11:00 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> Hi Lukasz,
> >>
> >> Did you manage to identify how maven was slower and test tesla stuff
> >> and potentially a few other fixes?
> >>
> >> Side note: figures without python can be interesting cause locally - =
> >> for me - python tends to flatten the figures whereas I get something
> >> close to your conclusions without python part.
> >>
> >> My point is mainly that switching now on gradle and being back on
> >> maven in a few months cause gradle ecosystem is far to support java 9
> >> - or any other volatile reason like this one - is probably not a good
> >> choice for a community. Maven is way more mainstream than gradle so
> >> helps to encourage people to contribute - vs gradle will increase the
> >> step to do it.
> >>
> >> I'd like to be sure before a switch that it is a one way decision and
> >> that the build tool was not just challenged by itself and its current
> >> state but also in the way it could be improved (= its community and
> >> potentially some local hacks).
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-27 19:46 GMT+01:00 Lukasz Cwik :
> >> > I have collected data by running several builds against master using
> >> Gradle
> >> > and Maven without using Gradle's support for incremental builds.
> >> >
> >> > Gradle (mins)
> >> > min: 25.04
> >> > max: 160.14
> >> > median: 45.78
> >> > average: 52.19
> >> > stdev: 30.80
> >> >
> >> > Maven (mins)
> >> > min: 56.86
> >> > max: 216.55 (actually > 240 mins because this data does not include
> >> > timeouts)
> >> > median: 87.93
> >> > average: 109.10
> >> > stdev: 48.01
> >> >
> >> > I excluded a few timeouts (240 mins) that happened during the Maven
> build
> >> > from its numbers but we can see conclusively that Gradle is twice as
> fast
> >> > for the build when compared to Maven when run using Jenkins.
> >> > On my desktop, I have enabled incremental builds and have seen a major
> >> > improvement on the above numbers but it doesn't yet work correctly
> >> because
> >> > of incorrectly specified inputs/outputs for certain tasks.
> >> >
> >> > The data is available here
> >> > https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_
> >> NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing
> >> >
> >> > With this data, I feel confident that we should swap and have

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Lukasz Cwik
I don't believe that such a switch could happen immediately as the scope of
the change is to replace 15k lines of pom files so anything which isn't
gradual is unlikely to work.

Below is a more thorough list of points which compare the two build systems
beyond speed.

Maven
Java Support: Mature
Python Support: None (via mvn exec plugin)
Go Support: Rudimentary (via mvn plugin)
Protobuf Support: Rudimentary (via mvn plugin)
Docker Support: Rudimentary (via mvn plugin)
ASF Release Automation: Mature
Jenkins Support: Mature
Configuration Language: XML
Multiple Java Versions: Yes
Static Analysis Tools: Some
ASF Release Audit Tool (RAT): Rudimentary (plugin complete and longstanding
but poor)
IntelliJ Integration: Mature
Eclipse Integration: Mature
Extensibility: Rudimentary
Number of GitHub Projects Using It: 146k
Continuous build daemon: None
Incremental build support: None (note that this is not the same as
incremental compile support offered by the compiler plugin)
Intra-module dependencies: Rudimentary (requires the use of many profiles
to get per runner dependencies)

Gradle
Java Support: Mature
Python Support: Rudimentary (pygradle, lacks pypi support)
Go Support: Rudimentary (gogradle plugin)
Protobuf Support: Rudimentary (via protobuf plugin)
Docker Support: Rudimentary (via docker plugin)
ASF Release Automation: ?
Jenkins Support: Mature
Configuration Language: Groovy
Multiple Java Versions: Yes
Static Analysis Tools: Some
ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache Maven
ANT plugin)
IntelliJ Integration: Mature
Eclipse Integration: Mature
Extensibility: Mature
Number of GitHub Projects Using It: 122k
Continuous build daemon: Mature
Incremental build support: Mature
Intra-module dependencies: Mature (via configurations)

Most of the above points show that they are very similar except for the
faster builds, better intra-module dependencies, and incremental build
support.


On Mon, Nov 27, 2017 at 1:06 PM, Ismaël Mejía  wrote:

> I have been a little bit out of the discussion on maven vs gradle
> because I was expecting the technical proof of concepts to evaluate
> the best approach. I deeply appreciate all the effort that Lukasz has
> put into the gradle version, and I also think that during the
> discussion Romain and others have bring some serious and important
> points that make the decision less simple than I expected (in the end
> sadly is not as simple as the fastest wins). In any case I don’t think
> that it is wise just to switch immediately to gradle, at least if
> switching means removing the maven files, we have to consider that the
> ‘full’ build/tests were introduced in the CI around one week ago, and
> I am not sure that this is sufficient time to evaluate any possible
> regression. Also I am particularly curious to know if the artifacts
> are correct and complete. Has somebody already simulated a release
> with the gradle build for example, this for me is a prerrequisite
> before we even start discussing about the switch.
>
>
> On Mon, Nov 27, 2017 at 9:34 PM, Lukasz Cwik 
> wrote:
> > On Mon, Nov 27, 2017 at 11:51 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> 2017-11-27 20:26 GMT+01:00 Lukasz Cwik :
> >> > Romain, as mentioned earlier, I identified that Maven was slower
> because
> >> it
> >> > needed to finish building the entire module before dependent modules
> >> could
> >> > start which included running tests, performing checkstyle, etc...
> >> > Gradle is able to increase the parallelism of the build process since
> it
> >> > has task driven parallelism so as long as the files are compiled, the
> >> > dependent projects can start.
> >>
> >> This means we can implement a maven graph builder which is better than
> >> the default one - surely with a thread safe local repo - and
> >> contribute it back to solve it durably.
> >>
> >> If speed for a clean build was the only problem then maybe but lack of
> > incremental builds across tasks is a goal we can actually achieve using
> > Gradle and won't require rewriting almost all of the Maven plugins to
> > support incremental builds.
> >
> >
> >> >
> >> > Maven and Gradle are both heavily used since there are ~146k Maven
> >> projects
> >> > on Github while there are ~122k Gradle project on Github. Do you have
> >> data
> >> > which shows that Maven is significantly more "mainstream"?
> >>
> >> Yep, project i worked on in companies using gradle: 0, all were based
> >> on maven and maven was "tool-ed" versus gradle was "best effort" in
> >> term of plugins.
> >> Now  - with my EE background - I can guarantee you gradle is not able
> >> to handle properly its build since it flatten the classpath and
> >> plugins conflicts very quickly (their plugin dependency feature never
> >> worked and almost no plugin impl it correctly).
> >>
> >> Wonder if it is easy to have the ASF stats, anyone knows?
> >
> >>
> >> > I believe we want a rich multi-language SDK and community and feel as
> >> > though it

[GitHub] jkff commented on issue #4139: Update cloud spanner library to 0.29.0

2017-11-27 Thread GitBox
jkff commented on issue #4139: Update cloud spanner library to 0.29.0
URL: https://github.com/apache/beam/pull/4139#issuecomment-347342624
 
 
   (bumping grpc version by +5 seems welcome but quite risky, hence the 
ValidatesRunner tests)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4139: Update cloud spanner library to 0.29.0

2017-11-27 Thread GitBox
jkff commented on issue #4139: Update cloud spanner library to 0.29.0
URL: https://github.com/apache/beam/pull/4139#issuecomment-347342562
 
 
   Run Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4139: Update cloud spanner library to 0.29.0

2017-11-27 Thread GitBox
jkff commented on issue #4139: Update cloud spanner library to 0.29.0
URL: https://github.com/apache/beam/pull/4139#issuecomment-347342299
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153335970
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   This seems like (another) bug in Sample.any. It should use 
Combine.globally().withoutDefaults() rather than simply Combine.globally().


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-27 Thread GitBox
jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes 
from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347341595
 
 
   That would be optimal, and in general I really like the idea of capturing 
the stack trace of transform application and using it in all error messages. We 
already do this in PAssert, but it can be applied much more widely - e.g. 
errors during coder inference, validation errors, some runtime errors etc.
   
   I suspect that implementing that can have some unexpected stumbling blocks, 
so I would recommend to start with a simpler improvement - just make the error 
message clearly say that the fix is to specify name in apply().
   
   However, if you're willing to drive an effort to use application stack 
traces in more places in Beam, starting with a discussion on the mailing list - 
that would be a wonderful contribution and I'd be happy to review it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Romain Manni-Bucau
Le 27 nov. 2017 21:34, "Lukasz Cwik"  a écrit :

On Mon, Nov 27, 2017 at 11:51 AM, Romain Manni-Bucau 
wrote:

> 2017-11-27 20:26 GMT+01:00 Lukasz Cwik :
> > Romain, as mentioned earlier, I identified that Maven was slower because
> it
> > needed to finish building the entire module before dependent modules
> could
> > start which included running tests, performing checkstyle, etc...
> > Gradle is able to increase the parallelism of the build process since it
> > has task driven parallelism so as long as the files are compiled, the
> > dependent projects can start.
>
> This means we can implement a maven graph builder which is better than
> the default one - surely with a thread safe local repo - and
> contribute it back to solve it durably.
>
> If speed for a clean build was the only problem then maybe but lack of
incremental builds across tasks is a goal we can actually achieve using
Gradle and won't require rewriting almost all of the Maven plugins to
support incremental builds.



Isnt the IDE enough here? What is lacking?



> >
> > Maven and Gradle are both heavily used since there are ~146k Maven
> projects
> > on Github while there are ~122k Gradle project on Github. Do you have
> data
> > which shows that Maven is significantly more "mainstream"?
>
> Yep, project i worked on in companies using gradle: 0, all were based
> on maven and maven was "tool-ed" versus gradle was "best effort" in
> term of plugins.
> Now  - with my EE background - I can guarantee you gradle is not able
> to handle properly its build since it flatten the classpath and
> plugins conflicts very quickly (their plugin dependency feature never
> worked and almost no plugin impl it correctly).
>
> Wonder if it is easy to have the ASF stats, anyone knows?

>
> > I believe we want a rich multi-language SDK and community and feel as
> > though it would be unwise to treat non JDK based languages as second
> class.
>
> Hmm, not sure how it is related to the build tool since Maven and
> Gradle have the same level of support - actually surprsingly maven is
> better for js and surely as bad as gradle for others - or here again
> we can create plugin like the frontend-maven-plugin if needed for
> other languages.
>
> That said it can be an interesting other thread since people consuming
> these languages will probably want their mainstream build tool and a
> "standard" repository layout rather than a java one. But this is
> harder to measure.
>
>
>
> > On Mon, Nov 27, 2017 at 11:00 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> Hi Lukasz,
> >>
> >> Did you manage to identify how maven was slower and test tesla stuff
> >> and potentially a few other fixes?
> >>
> >> Side note: figures without python can be interesting cause locally - =
> >> for me - python tends to flatten the figures whereas I get something
> >> close to your conclusions without python part.
> >>
> >> My point is mainly that switching now on gradle and being back on
> >> maven in a few months cause gradle ecosystem is far to support java 9
> >> - or any other volatile reason like this one - is probably not a good
> >> choice for a community. Maven is way more mainstream than gradle so
> >> helps to encourage people to contribute - vs gradle will increase the
> >> step to do it.
> >>
> >> I'd like to be sure before a switch that it is a one way decision and
> >> that the build tool was not just challenged by itself and its current
> >> state but also in the way it could be improved (= its community and
> >> potentially some local hacks).
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-27 19:46 GMT+01:00 Lukasz Cwik :
> >> > I have collected data by running several builds against master using
> >> Gradle
> >> > and Maven without using Gradle's support for incremental builds.
> >> >
> >> > Gradle (mins)
> >> > min: 25.04
> >> > max: 160.14
> >> > median: 45.78
> >> > average: 52.19
> >> > stdev: 30.80
> >> >
> >> > Maven (mins)
> >> > min: 56.86
> >> > max: 216.55 (actually > 240 mins because this data does not include
> >> > timeouts)
> >> > median: 87.93
> >> > average: 109.10
> >> > stdev: 48.01
> >> >
> >> > I excluded a few timeouts (240 mins) that happened during the Maven
> build
> >> > from its numbers but we can see conclusively that Gradle is twice as
> fast
> >> > for the build when compared to Maven when run using Jenkins.
> >> > On my desktop, I have enabled incremental builds and have seen a
major
> >> > improvement on the above numbers but it doesn't yet work correctly
> >> because
> >> > of incorrectly specified inputs/outputs for certain tasks.
> >> >
> >> > The data is available here
> >> > https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_
> >> NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing
> >> >
> >> > With this data, I feel confident that we should swap and have opened
> the
> >> > following issue https://issues.apache.org/jira/browse/BEAM-3249 and
> >> related
> >> 

[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153334683
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   With the current implementation, I got `java.lang.IllegalArgumentException: 
Attempted to get side input window for GlobalWindow from non-global WindowFn` 
with the following snippet. Am I doing something wrong?
   
   ```java
   public class Test {
 public static void main(String[] args) {
   PipelineOptions options = PipelineOptionsFactory.create();
   Pipeline p = Pipeline.create(options);
   
   p.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
   .apply(ParDo.of(new DoFn() {
 @ProcessElement
 public void processElement(ProcessContext c) {
   Integer x = c.element();
   c.outputWithTimestamp(x, new Instant(x * 1));
 }
   }))
   .apply(Window.into(FixedWindows.of(Duration.standardMinutes(1
   .apply(Sample.any(3));
   p.run();
 }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153293639
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -527,13 +521,8 @@ public void processElement(ProcessContext c, 
BoundedWindow window) throws Except
   writer.cleanup();
   throw e;
 }
-int shardNumber =
-shardNumberAssignment == ShardAssignment.ASSIGN_WHEN_WRITING
-? c.element().getKey().getShardNumber()
-: UNKNOWN_SHARDNUM;
-c.output(
-new FileResult<>(
-writer.getOutputFile(), shardNumber, window, c.pane(), 
entry.getKey()));
+int shard = c.element().getKey().getShardNumber();
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153291649
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -672,8 +661,11 @@ public void processElement(ProcessContext context) throws 
IOException {
 // PCollection. There is a dependency between this ParDo and the first (the
 // WriteOperation PCollection as a side input), so this will happen after 
the
 // initial ParDo.
-PCollection> results;
-final PCollectionView numShardsView;
+PCollectionView numShardsView =
+(computeNumShards == null) ? null : input.apply(computeNumShards);
+List> shardingSideInputs = numShardsView == null
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153291927
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -824,177 +826,78 @@ public void startBundle() {
 public void processElement(ProcessContext c) {
   fileResults.add(c.element());
   if (fixedNumShards == null) {
-if (numShardsView != null) {
-  fixedNumShards = c.sideInput(numShardsView);
-} else if (numShardsProvider != null) {
-  fixedNumShards = numShardsProvider.get();
-} else {
-  throw new IllegalStateException(
-  "When finalizing a windowed write, should have set fixed 
sharding");
-}
+fixedNumShards = getFixedNumShards.apply(c);
+checkState(fixedNumShards != null, "Windowed write should have set 
fixed sharding");
 
 Review comment:
   After https://github.com/apache/beam/pull/4124 they do - see 
https://github.com/apache/beam/pull/4137 for explanation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153291513
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
 ##
 @@ -686,40 +756,38 @@ public int compare(
  */
 @VisibleForTesting
 @Experimental(Kind.FILESYSTEM)
-final void copyToOutputFiles(
+final void moveToOutputFiles(
 List, ResourceId>> 
resultsToFinalFilenames) throws IOException {
   int numFiles = resultsToFinalFilenames.size();
-  if (numFiles > 0) {
-LOG.debug("Copying {} files.", numFiles);
-List srcFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
-List dstFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
-for (KV, ResourceId> entry : 
resultsToFinalFilenames) {
-  srcFiles.add(entry.getKey().getTempFilename());
-  dstFiles.add(entry.getValue());
-  LOG.info(
-  "Will copy temporary file {} to final location {}",
-  entry.getKey().getTempFilename(),
-  entry.getValue());
-}
-// During a failure case, files may have been deleted in an earlier 
step. Thus
-// we ignore missing files here.
-FileSystems.copy(srcFiles, dstFiles, 
StandardMoveOptions.IGNORE_MISSING_FILES);
-  } else {
-LOG.info("No output files to write.");
+  LOG.debug("Copying {} files.", numFiles);
+  List srcFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
+  List dstFiles = new 
ArrayList<>(resultsToFinalFilenames.size());
 
 Review comment:
   Seems overkill - they are created right next to each other in code, and 
FileSystems.copy() already does that verification. I removed the size hints to 
make it a little simpler (preallocation probably doesn't matter here).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on a change in pull request #4145: Many simplifications to WriteFiles

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4145: Many simplifications to 
WriteFiles
URL: https://github.com/apache/beam/pull/4145#discussion_r153294218
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -1011,14 +1002,19 @@ public void processElement(ProcessContext c) throws 
Exception {
   writer.open(uuid, destination);
   writer.close();
   completeResults.add(
-  new FileResult<>(writer.getOutputFile(), shard, null, null, 
destination));
+  new FileResult<>(
+  writer.getOutputFile(),
+  shard,
+  GlobalWindow.INSTANCE,
+  PaneInfo.ON_TIME_AND_ONLY_FIRING,
 
 Review comment:
   Previously the code was structured differently, and the values passed in 
this particular codepath ended up being ignored. I consolidated things somewhat 
to handle much of windowed and unwindowed case the same way, and made the 
requirements more strict, in particular that window and pane have to be always 
set.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [discuss] java profile

2017-11-27 Thread Romain Manni-Bucau
Well for validation builds- pre PR, incremental support is pointless since
it easily hides issues die to caching so a solution saving half of the
build without loosing anuyhing would still be good IMHO.

Le 27 nov. 2017 21:12, "Lukasz Cwik"  a écrit :

> Incremental builds aren't correctly setup right now so your likely to see
> Python/Go rebuild even if there were no changes. See
> https://issues.apache.org/jira/browse/BEAM-3253
>
> On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> wrote:
>
> > that was the goal: validate there was no side effect of the changes on
> > the whole project. Now the "not java" part of the build will not be
> > impacted by java changed so this is the part i want to skip since it
> > takes a lot of time and I have guarantees it is safe to skip them.
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >
> >
> > 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
> > > Romain, that will build the entire project. I think you want to execute
> > > (from the root of the project):
> > > ./gradlew :beam-sdks-parent:beam-sdks-python:build
> > >
> > > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
> > rmannibu...@gmail.com>
> > > wrote:
> > >
> > >> gradle build --no-daemon
> > >>
> > >> (with gradle 4.2)
> > >>
> > >> Romain Manni-Bucau
> > >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >>
> > >>
> > >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles :
> > >> > What is the gradle command you are using to build just the Python
> SDK?
> > >> >
> > >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
> > >> rmannibu...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hmm,
> > >> >>
> > >> >> issue is the same with gradle (locally python build takes 15mn
> alone
> > >> >> which is as much as the java build and it is not parallelized I
> > think)
> > >> >>
> > >> >> pl is not as smooth since it means doing it on each command whereas
> > >> >> the proposal is automatically activated through settings.xml
> > >> >>
> > >> >> Romain Manni-Bucau
> > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >> >>
> > >> >>
> > >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles  >:
> > >> >> > I think you can already mostly do this with mvn -pl sdks/XYZ -am
> > >> -amd. I
> > >> >> > think that we have other work (gradle support) underway that will
> > make
> > >> >> this
> > >> >> > a non-issue since gradle automatically does even better than the
> > >> profile
> > >> >> or
> > >> >> > -am -amd.
> > >> >> >
> > >> >> > On Mon, Nov 27, 2017 at 11:01 AM, Romain Manni-Bucau <
> > >> >> rmannibu...@gmail.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Hi guys,
> > >> >> >>
> > >> >> >> java/python/go/xxx support is great but as a developer you
> rarely
> > >> hack
> > >> >> >> on them all.
> > >> >> >>
> > >> >> >> For that reason I opened https://github.com/apache/
> beam/pull/4173
> > .
> > >> >> >>
> > >> >> >> Goal is to give each developer a way to build the whole project
> > and
> > >> >> >> all the code he can impact at once but without caring of the
> code
> > he
> > >> >> >> doesn't modify at all - other languages.
> > >> >> >>
> > >> >> >> Wdyt?
> > >> >> >>
> > >> >> >> Romain Manni-Bucau
> > >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >> >> >>
> > >> >>
> > >>
> >
>


[GitHub] jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
jkff commented on a change in pull request #4175: [BEAM-3247] fix Sample.any 
performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153332162
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Not sure why you say that: views are also per-window, so I think it 
shouldn't matter whether the collection is bounded or unbounded. (though, of 
course, it'll be behaving weirdly in case of multiple trigger firings - see 
also https://issues.apache.org/jira/browse/BEAM-2305, maybe similar issues 
apply here too)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153330138
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   OTOH the iterable view approeach won't even work with unbounded collection 
so guess this new semantic is better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix Sample.any performance

2017-11-27 Thread GitBox
nevillelyh commented on a change in pull request #4175: [BEAM-3247] fix 
Sample.any performance
URL: https://github.com/apache/beam/pull/4175#discussion_r153329942
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Sample.java
 ##
 @@ -209,29 +202,67 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   }
 
   /**
-   * A {@link DoFn} that returns up to limit elements from the side input 
PCollection.
+   * A {@link DoFn} that outputs up to limit elements.
*/
-  private static class SampleAnyDoFn extends DoFn {
-long limit;
-final PCollectionView> iterableView;
+  private static class SampleAnyDoFn extends DoFn {
 
 Review comment:
   Probably not, but that might change the semantics a bit. Without this, we'll 
combine each window into into `n` or fewer elements. But I guess that's more 
natural?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347333086
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347332806
 
 
   Run Python PostCommit
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347331790
 
 
   Thanks, that's helpful.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347331807
 
 
   run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Ismaël Mejía
I have been a little bit out of the discussion on maven vs gradle
because I was expecting the technical proof of concepts to evaluate
the best approach. I deeply appreciate all the effort that Lukasz has
put into the gradle version, and I also think that during the
discussion Romain and others have bring some serious and important
points that make the decision less simple than I expected (in the end
sadly is not as simple as the fastest wins). In any case I don’t think
that it is wise just to switch immediately to gradle, at least if
switching means removing the maven files, we have to consider that the
‘full’ build/tests were introduced in the CI around one week ago, and
I am not sure that this is sufficient time to evaluate any possible
regression. Also I am particularly curious to know if the artifacts
are correct and complete. Has somebody already simulated a release
with the gradle build for example, this for me is a prerrequisite
before we even start discussing about the switch.


On Mon, Nov 27, 2017 at 9:34 PM, Lukasz Cwik  wrote:
> On Mon, Nov 27, 2017 at 11:51 AM, Romain Manni-Bucau 
> wrote:
>
>> 2017-11-27 20:26 GMT+01:00 Lukasz Cwik :
>> > Romain, as mentioned earlier, I identified that Maven was slower because
>> it
>> > needed to finish building the entire module before dependent modules
>> could
>> > start which included running tests, performing checkstyle, etc...
>> > Gradle is able to increase the parallelism of the build process since it
>> > has task driven parallelism so as long as the files are compiled, the
>> > dependent projects can start.
>>
>> This means we can implement a maven graph builder which is better than
>> the default one - surely with a thread safe local repo - and
>> contribute it back to solve it durably.
>>
>> If speed for a clean build was the only problem then maybe but lack of
> incremental builds across tasks is a goal we can actually achieve using
> Gradle and won't require rewriting almost all of the Maven plugins to
> support incremental builds.
>
>
>> >
>> > Maven and Gradle are both heavily used since there are ~146k Maven
>> projects
>> > on Github while there are ~122k Gradle project on Github. Do you have
>> data
>> > which shows that Maven is significantly more "mainstream"?
>>
>> Yep, project i worked on in companies using gradle: 0, all were based
>> on maven and maven was "tool-ed" versus gradle was "best effort" in
>> term of plugins.
>> Now  - with my EE background - I can guarantee you gradle is not able
>> to handle properly its build since it flatten the classpath and
>> plugins conflicts very quickly (their plugin dependency feature never
>> worked and almost no plugin impl it correctly).
>>
>> Wonder if it is easy to have the ASF stats, anyone knows?
>
>>
>> > I believe we want a rich multi-language SDK and community and feel as
>> > though it would be unwise to treat non JDK based languages as second
>> class.
>>
>> Hmm, not sure how it is related to the build tool since Maven and
>> Gradle have the same level of support - actually surprsingly maven is
>> better for js and surely as bad as gradle for others - or here again
>> we can create plugin like the frontend-maven-plugin if needed for
>> other languages.
>>
>> That said it can be an interesting other thread since people consuming
>> these languages will probably want their mainstream build tool and a
>> "standard" repository layout rather than a java one. But this is
>> harder to measure.
>>
>>
>>
>> > On Mon, Nov 27, 2017 at 11:00 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> > wrote:
>> >
>> >> Hi Lukasz,
>> >>
>> >> Did you manage to identify how maven was slower and test tesla stuff
>> >> and potentially a few other fixes?
>> >>
>> >> Side note: figures without python can be interesting cause locally - =
>> >> for me - python tends to flatten the figures whereas I get something
>> >> close to your conclusions without python part.
>> >>
>> >> My point is mainly that switching now on gradle and being back on
>> >> maven in a few months cause gradle ecosystem is far to support java 9
>> >> - or any other volatile reason like this one - is probably not a good
>> >> choice for a community. Maven is way more mainstream than gradle so
>> >> helps to encourage people to contribute - vs gradle will increase the
>> >> step to do it.
>> >>
>> >> I'd like to be sure before a switch that it is a one way decision and
>> >> that the build tool was not just challenged by itself and its current
>> >> state but also in the way it could be improved (= its community and
>> >> potentially some local hacks).
>> >>
>> >> Romain Manni-Bucau
>> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> >>
>> >>
>> >> 2017-11-27 19:46 GMT+01:00 Lukasz Cwik :
>> >> > I have collected data by running several builds against master using
>> >> Gradle
>> >> > and Maven without using Gradle's support for incremental builds.
>> >> >
>> >> > Gradle (mins)
>> >> > min: 25.04
>> >> > max: 160.14
>> >> > median

[GitHub] pabloem opened a new pull request #4180: Updating dataflow API version to newer release.

2017-11-27 Thread GitBox
pabloem opened a new pull request #4180: Updating dataflow API version to newer 
release.
URL: https://github.com/apache/beam/pull/4180
 
 
   r: @bjchambers 
   This new release contains the changes to the Dataflow API protos for side 
input inter transform IO.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Azure(ADLS) compatibility on Beam with Spark runner

2017-11-27 Thread Lukasz Cwik
Out of curiosity, does using the DirectRunner with ADL work for you?
If not, then you'll be able to debug locally why its failing.

On Fri, Nov 24, 2017 at 8:09 PM, Milan Chandna <
milan.chan...@microsoft.com.invalid> wrote:

> Hi JB,
>
> Thanks for the updates.
> BTW I am myself in Microsoft but I am trying this out of my interest.
> And it's good to know that someone else is also working on this.
>
> -Milan.
>
> -Original Message-
> From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
> Sent: Thursday, November 23, 2017 1:47 PM
> To: dev@beam.apache.org
> Subject: Re: Azure(ADLS) compatibility on Beam with Spark runner
>
> The Azure guys tried to use ADLS via Beam HDFS filesystem, but it seems
> they didn't succeed.
> The new approach we plan is to directly use the ADLS API.
>
> I keep you posted.
>
> Regards
> JB
>
> On 11/23/2017 07:42 AM, Milan Chandna wrote:
> > I tried both the ways.
> > Passed ADL specific configuration in --hdfsConfiguration as well and
> have setup the core-site.xml/hdfs-site.xml as well.
> > As I mentioned it's a HDI + Spark cluster, those things are already
> setup.
> > Spark job(without Beam) is also able to read and write to ADLS on same
> machine.
> >
> > BTW if the authentication or understanding ADL was a problem, it would
> have thrown error like ADLFileSystem missing or probably access failed or
> something. Thoughts?
> >
> > -Milan.
> >
> > -Original Message-
> > From: Lukasz Cwik [mailto:lc...@google.com.INVALID]
> > Sent: Thursday, November 23, 2017 5:05 AM
> > To: dev@beam.apache.org
> > Subject: Re: Azure(ADLS) compatibility on Beam with Spark runner
> >
> > In your example it seems as though your HDFS configuration doesn't
> contain any ADL specific configuration:  "--hdfsConfiguration='[{\"fs.
> defaultFS\":
> > \"hdfs://home/sample.txt\"]'"
> > Do you have a core-site.xml or hdfs-site.xml configured as per:
> > https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-
> azure-datalake%2Findex.html&data=02%7C01%7CMilan.Chandna%40microsoft.com%
> 7Cb7dffcc26bfe44df589a08d53201aeab%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C636469905161638292&sdata=Z%2FNJPDOZf5Xn6g9mVDfYdGiQKBPLJ1
> Gft8eka5W7Yts%3D&reserved=0?
> >
> >  From the documentation for --hdfsConfiguration:
> > A list of Hadoop configurations used to configure zero or more Hadoop
> filesystems. By default, Hadoop configuration is loaded from
> 'core-site.xml' and 'hdfs-site.xml based upon the HADOOP_CONF_DIR and
> YARN_CONF_DIR environment variables. To specify configuration on the
> command-line, represent the value as a JSON list of JSON maps, where each
> map represents the entire configuration for a single Hadoop filesystem. For
> example --hdfsConfiguration='[{\"fs.default.name\":
> > \"hdfs://localhost:9998\", ...},{\"fs.default.name\": \"s3a://\",
> ...},...]'
> > From:
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > b.com%2Fapache%2Fbeam%2Fblob%2F9f81fd299bd32e0d6056a7da9fa994cf74db0ed
> > 9%2Fsdks%2Fjava%2Fio%2Fhadoop-file-system%2Fsrc%2Fmain%2Fjava%2Forg%2F
> > apache%2Fbeam%2Fsdk%2Fio%2Fhdfs%2FHadoopFileSystemOptions.java%23L45&d
> > ata=02%7C01%7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d5
> > 3201aeab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6364699051616382
> > 92&sdata=tL3UzNW4OBuFa1LMIzZsyR8eSqBoZ7hWVJipnznrQ5Q%3D&reserved=0
> >
> > On Wed, Nov 22, 2017 at 1:12 AM, Jean-Baptiste Onofré
> > 
> > wrote:
> >
> >> Hi,
> >>
> >> FYI, I'm in touch with Microsoft Azure team about that.
> >>
> >> We are testing the ADLS support via HDFS.
> >>
> >> I keep you posted.
> >>
> >> Regards
> >> JB
> >>
> >> On 11/22/2017 09:12 AM, Milan Chandna wrote:
> >>
> >>> Hi,
> >>>
> >>> Has anyone tried IO from(to) ADLS account on Beam with Spark runner?
> >>> I was trying recently to do this but was unable to make it work.
> >>>
> >>> Steps that I tried:
> >>>
> >>> 1.  Took HDI + Spark 1.6 cluster with default storage as ADLS
> account.
> >>> 2.  Built Apache Beam on that. Built to include Beam-2790<
> >>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
> >>> u
> >>> es.apache.org%2Fjira%2Fbrowse%2FBEAM-2790&data=02%7C01%
> 7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d53201aeab%
> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636469905161638292&sdata=aj%
> 2FlaXlhlOQtnlRqHh8yLs2KfOZuRwDUUFvTpLB3Atg%3D&reserved=0> fix which
> earlier I was facing for ADL as well.
> >>> 3.  Modified WordCount.java example to use HadoopFileSystemOptions
> >>> 4.  Since HDI + Spark cluster has ADLS as defaultFS, tried 2 things
> >>>*   Just gave the input path and output path as
> >>> adl://home/sample.txt and adl://home/output
> >>>*   In addition to adl input and output path, also gave required
> >>> HDFS configuration with adl required configs as well.
> >>>
> >>> Both didn't worked btw.
> >>> s
> >>> 1.  Have checked ACL's and permissions

[GitHub] reuvenlax commented on issue #4116: [BEAM-2953] Part 1 of Multipart advanced timeseries examples

2017-11-27 Thread GitBox
reuvenlax commented on issue #4116: [BEAM-2953] Part 1 of Multipart advanced 
timeseries examples
URL: https://github.com/apache/beam/pull/4116#issuecomment-347321463
 
 
   R: @kennknowles 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [Build] Enforce failed on project beam-sdks-java-io-tika

2017-11-27 Thread Lukasz Cwik
Try running with -X to get all the debug output when using Maven. It may
give more details.

Alternatively, try using the gradle build (from root of project directory):
./gradlew build



On Fri, Nov 24, 2017 at 9:46 PM, Manu Zhang  wrote:

> Hi all,
>
> Has anyone seen this issue when building latest master on Mac ?
>
> *[ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce)
> on project beam-sdks-java-io-tika: Execution enforce of goal
> org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce failed.
> NullPointerException -> [Help 1]*
>
> Thanks,
> Manu
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-27 Thread Lukasz Cwik
On Mon, Nov 27, 2017 at 11:51 AM, Romain Manni-Bucau 
wrote:

> 2017-11-27 20:26 GMT+01:00 Lukasz Cwik :
> > Romain, as mentioned earlier, I identified that Maven was slower because
> it
> > needed to finish building the entire module before dependent modules
> could
> > start which included running tests, performing checkstyle, etc...
> > Gradle is able to increase the parallelism of the build process since it
> > has task driven parallelism so as long as the files are compiled, the
> > dependent projects can start.
>
> This means we can implement a maven graph builder which is better than
> the default one - surely with a thread safe local repo - and
> contribute it back to solve it durably.
>
> If speed for a clean build was the only problem then maybe but lack of
incremental builds across tasks is a goal we can actually achieve using
Gradle and won't require rewriting almost all of the Maven plugins to
support incremental builds.


> >
> > Maven and Gradle are both heavily used since there are ~146k Maven
> projects
> > on Github while there are ~122k Gradle project on Github. Do you have
> data
> > which shows that Maven is significantly more "mainstream"?
>
> Yep, project i worked on in companies using gradle: 0, all were based
> on maven and maven was "tool-ed" versus gradle was "best effort" in
> term of plugins.
> Now  - with my EE background - I can guarantee you gradle is not able
> to handle properly its build since it flatten the classpath and
> plugins conflicts very quickly (their plugin dependency feature never
> worked and almost no plugin impl it correctly).
>
> Wonder if it is easy to have the ASF stats, anyone knows?

>
> > I believe we want a rich multi-language SDK and community and feel as
> > though it would be unwise to treat non JDK based languages as second
> class.
>
> Hmm, not sure how it is related to the build tool since Maven and
> Gradle have the same level of support - actually surprsingly maven is
> better for js and surely as bad as gradle for others - or here again
> we can create plugin like the frontend-maven-plugin if needed for
> other languages.
>
> That said it can be an interesting other thread since people consuming
> these languages will probably want their mainstream build tool and a
> "standard" repository layout rather than a java one. But this is
> harder to measure.
>
>
>
> > On Mon, Nov 27, 2017 at 11:00 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> Hi Lukasz,
> >>
> >> Did you manage to identify how maven was slower and test tesla stuff
> >> and potentially a few other fixes?
> >>
> >> Side note: figures without python can be interesting cause locally - =
> >> for me - python tends to flatten the figures whereas I get something
> >> close to your conclusions without python part.
> >>
> >> My point is mainly that switching now on gradle and being back on
> >> maven in a few months cause gradle ecosystem is far to support java 9
> >> - or any other volatile reason like this one - is probably not a good
> >> choice for a community. Maven is way more mainstream than gradle so
> >> helps to encourage people to contribute - vs gradle will increase the
> >> step to do it.
> >>
> >> I'd like to be sure before a switch that it is a one way decision and
> >> that the build tool was not just challenged by itself and its current
> >> state but also in the way it could be improved (= its community and
> >> potentially some local hacks).
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-27 19:46 GMT+01:00 Lukasz Cwik :
> >> > I have collected data by running several builds against master using
> >> Gradle
> >> > and Maven without using Gradle's support for incremental builds.
> >> >
> >> > Gradle (mins)
> >> > min: 25.04
> >> > max: 160.14
> >> > median: 45.78
> >> > average: 52.19
> >> > stdev: 30.80
> >> >
> >> > Maven (mins)
> >> > min: 56.86
> >> > max: 216.55 (actually > 240 mins because this data does not include
> >> > timeouts)
> >> > median: 87.93
> >> > average: 109.10
> >> > stdev: 48.01
> >> >
> >> > I excluded a few timeouts (240 mins) that happened during the Maven
> build
> >> > from its numbers but we can see conclusively that Gradle is twice as
> fast
> >> > for the build when compared to Maven when run using Jenkins.
> >> > On my desktop, I have enabled incremental builds and have seen a major
> >> > improvement on the above numbers but it doesn't yet work correctly
> >> because
> >> > of incorrectly specified inputs/outputs for certain tasks.
> >> >
> >> > The data is available here
> >> > https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_
> >> NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing
> >> >
> >> > With this data, I feel confident that we should swap and have opened
> the
> >> > following issue https://issues.apache.org/jira/browse/BEAM-3249 and
> >> related
> >> > sub-tasks.
> >> >
> >> > On Sun, Nov 19, 2017 at 11:23 AM, Jean-Baptiste Onofré <
> j...@nanth

Re: [discuss] java profile

2017-11-27 Thread Lukasz Cwik
Incremental builds aren't correctly setup right now so your likely to see
Python/Go rebuild even if there were no changes. See
https://issues.apache.org/jira/browse/BEAM-3253

On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau 
wrote:

> that was the goal: validate there was no side effect of the changes on
> the whole project. Now the "not java" part of the build will not be
> impacted by java changed so this is the part i want to skip since it
> takes a lot of time and I have guarantees it is safe to skip them.
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>
>
> 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
> > Romain, that will build the entire project. I think you want to execute
> > (from the root of the project):
> > ./gradlew :beam-sdks-parent:beam-sdks-python:build
> >
> > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> gradle build --no-daemon
> >>
> >> (with gradle 4.2)
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles :
> >> > What is the gradle command you are using to build just the Python SDK?
> >> >
> >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
> >> rmannibu...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hmm,
> >> >>
> >> >> issue is the same with gradle (locally python build takes 15mn alone
> >> >> which is as much as the java build and it is not parallelized I
> think)
> >> >>
> >> >> pl is not as smooth since it means doing it on each command whereas
> >> >> the proposal is automatically activated through settings.xml
> >> >>
> >> >> Romain Manni-Bucau
> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >>
> >> >>
> >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles :
> >> >> > I think you can already mostly do this with mvn -pl sdks/XYZ -am
> >> -amd. I
> >> >> > think that we have other work (gradle support) underway that will
> make
> >> >> this
> >> >> > a non-issue since gradle automatically does even better than the
> >> profile
> >> >> or
> >> >> > -am -amd.
> >> >> >
> >> >> > On Mon, Nov 27, 2017 at 11:01 AM, Romain Manni-Bucau <
> >> >> rmannibu...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Hi guys,
> >> >> >>
> >> >> >> java/python/go/xxx support is great but as a developer you rarely
> >> hack
> >> >> >> on them all.
> >> >> >>
> >> >> >> For that reason I opened https://github.com/apache/beam/pull/4173
> .
> >> >> >>
> >> >> >> Goal is to give each developer a way to build the whole project
> and
> >> >> >> all the code he can impact at once but without caring of the code
> he
> >> >> >> doesn't modify at all - other languages.
> >> >> >>
> >> >> >> Wdyt?
> >> >> >>
> >> >> >> Romain Manni-Bucau
> >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >> >>
> >> >>
> >>
>


[GitHub] kennknowles commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
kennknowles commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347310170
 
 
   You can `run seed job` and it will actually be installed right away. It can 
then be rolled back if broken. Other than that, you need to have a test Jenkins 
cluster, which is a bit of work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tvalentyn commented on issue #4179: Fix repository path for Python Jenkins builds. Source directory was c?

2017-11-27 Thread GitBox
tvalentyn commented on issue #4179: Fix repository path for Python Jenkins 
builds. Source directory was c?
URL: https://github.com/apache/beam/pull/4179#issuecomment-347309489
 
 
   @kennknowles @jasonkuster @lukecwik - Do we have a process to test groovy 
changes before the commit?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: GitBox down

2017-11-27 Thread Reuven Lax
We need to update the contribution guide as well.

On Nov 27, 2017 10:30 AM, "Jean-Baptiste Onofré"  wrote:

> No I don't think so,
>
> In your case, it's because you still refering git-wip-us (probably from
> ): you have to use gitbox instead.
>
> I already updated master branch but I didn't update the release branch.
>
> Regards
> JB
>
> On 11/27/2017 07:02 PM, Reuven Lax wrote:
>
>> Does this have anything to do with the fact that I'm starting to get:
>>
>> fatal: repository 'https://git-wip-us.apache.org/repos/asf/beam.git/' not
>> found
>>
>> On Sun, Nov 26, 2017 at 10:37 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Hi,
>>>
>>> it seems GitBox is down (timing out).
>>>
>>> I don't see anything on status.apache.org, I will ping INFRA.
>>>
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


  1   2   >