Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Kenneth Knowles
Oh, yikes. It seems like https://github.com/gradle/gradle/issues/847 indicates
that the feature to use the default names in Gradle is practically
nonfunctional. If that bug is as severe as it looks, I have to retract my
position. Like we could never have sdks/java/core and sdks/py/core, right?

Kenn

On Mon, Apr 1, 2019 at 6:27 PM Michael Luckey  wrote:

> FWIW, hacked something as showcase for BEAM-4046 [1]
>
> This is miserably broken, but a
>
> ./gradlew projects
>
> or
>
> ./gradlew -p sdks/java build
>
> should work. Anything else is likely to cause issues. If u hit stack
> overflow exception, it's likely caused by
> https://github.com/gradle/gradle/issues/847
>
> To continue here, lots of cleanup has to be done. We might also need to
> rename folders etc, do better reflect semantic intentions.
>
> [1] https://github.com/apache/beam/pull/8194
>
> On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  wrote:
>>
>>>
>>>
>>> On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles  wrote:
>>>

 As to building an aggregated "Java" project, I think the blocker will
 be supporting conflicting deps. For IOs like ElasticSearch and runners like
 Flink the conflict is essential and deliberate, to support multiple
 versions of other services. And that is not even talking about transitive
 dep conflicts. I think Python and Go don't have this issue simply because
 they haven't tackled those problems.

 Are you talking about just a shortcut for building (super easy to just
 add since we are using Gradle) or a new artifact that you want to
 distribute?

 On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik  wrote:

> During the gradle migration, we used to have something like:
>
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
>
> Just to be super clear, this is Gradle default and is equivalent to
 just leaving it blank.


> but we discovered the Maven module names that were used during
> publishing were "core" / "sql" / ... (effectively the directory name)
> instead of "beam-sdks-java-core".
>

 Isn't this managed by the publication plugin?
 https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
  "overriding
 the default identity values is easy: simply specify the groupId, artifactId
 or version attributes when configuring the MavenPublication."

>>>
>>> During the gradle migration this wasn't that easy. The new maven publish
>>> plugin improved a lot since then.
>>>
>>>
 Using the default at the time also broke the artifact names for intra
> project dependencies that we generate[1]. Finally, we also ran into an
> issue because we had more then one Gradle project with the same directory
> name even though they were under a different parent folder (I think it was
> "core") and that was leading to some strange build time behavior.
>

 Weird. But I think the Jira should still stand as a move towards
 simplifying our build and making it more discoverable for new contributors.

>>>
>>> Agree on the JIRA makes sense, just calling out that there were other
>>> issues that this naming had caused in the past which should be checked
>>> before we call this done.
>>>
>>
>> Totally agree. It will be quite a large task with a lot of boilerplate
>> that might not be separable from technical blockers that come up as you go
>> through the boilerplate.
>>
>> Kenn
>>
>>
>>
>> Kenn


> We didn't migrate to a flat project structure where each project is a
> folder underneath the root project because of the existing Maven build
> rules that were being maintained in parallel and I'm not sure if people
> would want to have a flat project structure either.
>
> 1:
> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055
>
> On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey 
> wrote:
>
>> Hi,
>>
>> although I did not yet manage to get deeper involved into actual
>> development, I think this ability would be a useful addition.
>>
>> But I would also like to point out, that this is kind of implicit, as
>> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
>>
>> For instance, we would change the current setup from
>>
>> include "beam-sdks-java-core"
>> project(":beam-sdks-java-core").dir = file("sdks/java/core")
>>
>> to something like
>>
>> include(":sdks:java:core")
>> include(":sdks:java:extensions:sql")
>> include(":sdks:python")
>>
>>
>> With this in place a plain
>>
>> $ ./gradlew -p sdks/java build
>>
>> would exactly do what you want. And, of course, this will also 

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Michael Luckey
FWIW, hacked something as showcase for BEAM-4046 [1]

This is miserably broken, but a

./gradlew projects

or

./gradlew -p sdks/java build

should work. Anything else is likely to cause issues. If u hit stack
overflow exception, it's likely caused by
https://github.com/gradle/gradle/issues/847

To continue here, lots of cleanup has to be done. We might also need to
rename folders etc, do better reflect semantic intentions.

[1] https://github.com/apache/beam/pull/8194

On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles  wrote:

>
>
> On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  wrote:
>
>>
>>
>> On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles  wrote:
>>
>>>
>>> As to building an aggregated "Java" project, I think the blocker will be
>>> supporting conflicting deps. For IOs like ElasticSearch and runners like
>>> Flink the conflict is essential and deliberate, to support multiple
>>> versions of other services. And that is not even talking about transitive
>>> dep conflicts. I think Python and Go don't have this issue simply because
>>> they haven't tackled those problems.
>>>
>>> Are you talking about just a shortcut for building (super easy to just
>>> add since we are using Gradle) or a new artifact that you want to
>>> distribute?
>>>
>>> On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik  wrote:
>>>
 During the gradle migration, we used to have something like:

 include(":sdks:java:core")
 include(":sdks:java:extensions:sql")
 include(":sdks:python")

 Just to be super clear, this is Gradle default and is equivalent to
>>> just leaving it blank.
>>>
>>>
 but we discovered the Maven module names that were used during
 publishing were "core" / "sql" / ... (effectively the directory name)
 instead of "beam-sdks-java-core".

>>>
>>> Isn't this managed by the publication plugin?
>>> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>>>  "overriding
>>> the default identity values is easy: simply specify the groupId, artifactId
>>> or version attributes when configuring the MavenPublication."
>>>
>>
>> During the gradle migration this wasn't that easy. The new maven publish
>> plugin improved a lot since then.
>>
>>
>>> Using the default at the time also broke the artifact names for intra
 project dependencies that we generate[1]. Finally, we also ran into an
 issue because we had more then one Gradle project with the same directory
 name even though they were under a different parent folder (I think it was
 "core") and that was leading to some strange build time behavior.

>>>
>>> Weird. But I think the Jira should still stand as a move towards
>>> simplifying our build and making it more discoverable for new contributors.
>>>
>>
>> Agree on the JIRA makes sense, just calling out that there were other
>> issues that this naming had caused in the past which should be checked
>> before we call this done.
>>
>
> Totally agree. It will be quite a large task with a lot of boilerplate
> that might not be separable from technical blockers that come up as you go
> through the boilerplate.
>
> Kenn
>
>
>
> Kenn
>>>
>>>
 We didn't migrate to a flat project structure where each project is a
 folder underneath the root project because of the existing Maven build
 rules that were being maintained in parallel and I'm not sure if people
 would want to have a flat project structure either.

 1:
 https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055

 On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey 
 wrote:

> Hi,
>
> although I did not yet manage to get deeper involved into actual
> development, I think this ability would be a useful addition.
>
> But I would also like to point out, that this is kind of implicit, as
> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
>
> For instance, we would change the current setup from
>
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = file("sdks/java/core")
>
> to something like
>
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
>
>
> With this in place a plain
>
> $ ./gradlew -p sdks/java build
>
> would exactly do what you want. And, of course, this will also work
> for 'sdks/java/io', 'runners/' etc. Hope, you get the point.
>
> Currently, we deviate from gradle default convention and therefore
> have to implement some quirks to restore default behaviour. And I somehow
> dislike the structure introduced by parent/child folders, which will be
> destroyed by our current project definitions.
>
> But, to be honest, although I have some clear understanding on how to
> proceed here - especially regarding the requirement 

Re: kafka 0.9 support

2019-04-01 Thread Austin Bennett
FWIW --

On my (desired, not explicitly job-function) roadmap is to tap into a bunch
of our corporate Kafka queues to ingest that data to places I can use.
Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade path
isn't trivial, is very critical flows, and they are scared for it to break,
so it just sits behind firewalls, etc).  But, I wouldn't begin that for
probably at least another quarter.

I don't contribute to nor understand the burden of maintaining the support
for the older version, so can't reasonably lobby for that continued pain.

Anecdotally, this could be a place many enterprises are at (though I also
wonder whether many of the people that would be 'stuck' on such versions
would also have Beam on their current radar).


On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:

> This could be a backward-incompatible change, though that notion has many
> interpretations. What matters is user pain. Technically if we don't break
> the core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO
> 2.11.0 forever.
>
> How are multiple versions of Kafka supported? Are they all in one client,
> or is there a case for forks like ElasticSearchIO?
>
> Kenn
>
> On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
> wrote:
>
>> +1 to remove 0.9 support.
>>
>> I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)
>>
>> Regards
>> JB
>>
>> On 01/04/2019 19:36, David Morávek wrote:
>> > Hello,
>> >
>> > is there still a reason to keep Kafka 0.9 support? This unfortunately
>> > adds lot of complexity to KafkaIO implementation.
>> >
>> > Kafka 0.9 was released on Nov 2015.
>> >
>> > My first shot on removing Kafka 0.9 support would remove second
>> > consumer, which is used for fetching offsets.
>> >
>> > WDYT? Is this support worth keeping?
>> >
>> > https://github.com/apache/beam/pull/8186
>> >
>> > D.
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: [Announcement] New Website for Beam Summits

2019-04-01 Thread Joana Filipa Bernardo Carrasqueira
Hi Alexey,

We will use this website as a platform moving forward, and you will all the
information related to future Beam Summits. At the moment, we don't have
plans to include previous events (summit last year) but as we move forward,
the Beam Summits will all be featured on that website.


On Thu, Mar 21, 2019 at 8:42 AM Alexey Romanenko 
wrote:

> Great initiative, thanks for creating this!
>
> Btw, are any plans to add there information about previous Beam-related
> events, especially London Beam summit last year?
>
> On 20 Mar 2019, at 19:30, David Morávek  wrote:
>
> This is great! Thanks for all of the hard work you're putting into this.
>
> D.
>
> On Wed, Mar 20, 2019 at 1:38 PM Maximilian Michels  wrote:
>
>> Not a bug, it's a feature ;)
>>
>> On 20.03.19 07:23, Kenneth Knowles wrote:
>> > Very nice. I appreciate the emphasis on coffee [1] [2] [3] though I
>> > suspect there may be a rendering bug.
>> >
>> > Kenn
>> >
>> > [1] https://beamsummit.org/schedule/2019-06-19?sessionId=1
>> > [2] https://beamsummit.org/schedule/2019-06-19?sessionId=3
>> > [3] https://beamsummit.org/schedule/2019-06-19?sessionId=4
>> >
>> > On Tue, Mar 19, 2019 at 4:43 AM Łukasz Gajowy > > > wrote:
>> >
>> > Looks great! Thanks for doing this! :)
>> >
>> > Łukasz
>> >
>> > wt., 19 mar 2019 o 12:30 Maximilian Michels > > > napisał(a):
>> >
>> > Great stuff! Looking forward to seeing many Beam folks in
>> Berlin.
>> >
>> > In case you want to speak at Beam Summit Europe, the Call for
>> > Papers is
>> > open until April 1:
>> https://sessionize.com/beam-summit-europe-2019
>> >
>> > -Max
>> >
>> > On 19.03.19 09:49, Matthias Baetens wrote:
>> >  > Awesome Aizhamal! Great work and thanks for your continued
>> > efforts on
>> >  > this :) Looking forward to the summit.
>> >  >
>> >  > On Mon, 18 Mar 2019 at 23:17, Aizhamal Nurmamat kyzy
>> >  > mailto:aizha...@google.com>
>> > >>
>> wrote:
>> >  >
>> >  > Hello everybody!
>> >  >
>> >  >
>> >  > We are thrilled to announce the launch of beamsummit.org
>> > 
>> >  >  dedicated to Beam Summits!
>> >  >
>> >  >
>> >  > The current version of the website provides information
>> > about the
>> >  > upcoming Beam Summit in Europe on June 19-20th, 2019. We
>> > will update
>> >  > it for the upcoming summits in Asia and North America
>> > accordingly.
>> >  > You can access all necessary information about the
>> > conference theme,
>> >  > speakers and sessions, the abstract submission timeline
>> > and the
>> >  > registration process, the conference venues and much more
>> > that you
>> >  > will find useful until and during the Beam Summits 2019.
>> >  >
>> >  >
>> >  > We are working to make the website easy to use, so that
>> > anyone who
>> >  > is organizing a Beam event can rely on it. You can find
>> > the code for
>> >  > it in Github
>> > .
>> >  >
>> >  > The pages will be updated on a regular basis, but we also
>> > love
>> >  > hearing thoughts from our community! Let us know if you
>> > have any
>> >  > questions, comments or suggestions, and help us improve.
>> > Also, if
>> >  > you are thinking of organizing a Beam event, please feel
>> > free to
>> >  > reach out > > >for support, and to use the
>> >  > code in GitHub as well.
>> >  >
>> >  >
>> >  > We sincerely hope that you like the new Beam Summit
>> > website and will
>> >  > find it useful for accessing information. Enjoy browsing
>> > around!
>> >  >
>> >  >
>> >  > Thanks,
>> >  >
>> >  > Aizhamal
>> >  >
>> >
>>
>
>

-- 

*Joana Carrasqueira*

Cloud Developer Relations Events Manager

+1 415-602-2507

1160 N Mathilda Ave, Sunnyvale, CA 94089


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Kenneth Knowles
On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  wrote:

>
>
> On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles  wrote:
>
>>
>> As to building an aggregated "Java" project, I think the blocker will be
>> supporting conflicting deps. For IOs like ElasticSearch and runners like
>> Flink the conflict is essential and deliberate, to support multiple
>> versions of other services. And that is not even talking about transitive
>> dep conflicts. I think Python and Go don't have this issue simply because
>> they haven't tackled those problems.
>>
>> Are you talking about just a shortcut for building (super easy to just
>> add since we are using Gradle) or a new artifact that you want to
>> distribute?
>>
>> On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik  wrote:
>>
>>> During the gradle migration, we used to have something like:
>>>
>>> include(":sdks:java:core")
>>> include(":sdks:java:extensions:sql")
>>> include(":sdks:python")
>>>
>>> Just to be super clear, this is Gradle default and is equivalent to just
>> leaving it blank.
>>
>>
>>> but we discovered the Maven module names that were used during
>>> publishing were "core" / "sql" / ... (effectively the directory name)
>>> instead of "beam-sdks-java-core".
>>>
>>
>> Isn't this managed by the publication plugin?
>> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>>  "overriding
>> the default identity values is easy: simply specify the groupId, artifactId
>> or version attributes when configuring the MavenPublication."
>>
>
> During the gradle migration this wasn't that easy. The new maven publish
> plugin improved a lot since then.
>
>
>> Using the default at the time also broke the artifact names for intra
>>> project dependencies that we generate[1]. Finally, we also ran into an
>>> issue because we had more then one Gradle project with the same directory
>>> name even though they were under a different parent folder (I think it was
>>> "core") and that was leading to some strange build time behavior.
>>>
>>
>> Weird. But I think the Jira should still stand as a move towards
>> simplifying our build and making it more discoverable for new contributors.
>>
>
> Agree on the JIRA makes sense, just calling out that there were other
> issues that this naming had caused in the past which should be checked
> before we call this done.
>

Totally agree. It will be quite a large task with a lot of boilerplate that
might not be separable from technical blockers that come up as you go
through the boilerplate.

Kenn



Kenn
>>
>>
>>> We didn't migrate to a flat project structure where each project is a
>>> folder underneath the root project because of the existing Maven build
>>> rules that were being maintained in parallel and I'm not sure if people
>>> would want to have a flat project structure either.
>>>
>>> 1:
>>> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055
>>>
>>> On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey 
>>> wrote:
>>>
 Hi,

 although I did not yet manage to get deeper involved into actual
 development, I think this ability would be a useful addition.

 But I would also like to point out, that this is kind of implicit, as
 soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.

 For instance, we would change the current setup from

 include "beam-sdks-java-core"
 project(":beam-sdks-java-core").dir = file("sdks/java/core")

 to something like

 include(":sdks:java:core")
 include(":sdks:java:extensions:sql")
 include(":sdks:python")


 With this in place a plain

 $ ./gradlew -p sdks/java build

 would exactly do what you want. And, of course, this will also work for
 'sdks/java/io', 'runners/' etc. Hope, you get the point.

 Currently, we deviate from gradle default convention and therefore have
 to implement some quirks to restore default behaviour. And I somehow
 dislike the structure introduced by parent/child folders, which will be
 destroyed by our current project definitions.

 But, to be honest, although I have some clear understanding on how to
 proceed here - especially regarding the requirement to keep the change
 backwards compatible - we might decide not to switch. Because deeper
 investigation might reveal issues, which I am currently not aware of.

 Best,

 michel

 On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré 
 wrote:

> Hi guys,
>
> I would like to introduce a Gradle "meta" project for the build:
> beam-sdks-java.
>
> The idea is to simply build all Java SDK related resources (core, IO,
> ...).
>
> The purpose is also to be aligned with the other SDKs which provide
> beam-sdks-go and beam-sdks-python.
>
> Thoughts ?
>
> Regards
> 

Re: kafka 0.9 support

2019-04-01 Thread Kenneth Knowles
This could be a backward-incompatible change, though that notion has many
interpretations. What matters is user pain. Technically if we don't break
the core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO
2.11.0 forever.

How are multiple versions of Kafka supported? Are they all in one client,
or is there a case for forks like ElasticSearchIO?

Kenn

On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
wrote:

> +1 to remove 0.9 support.
>
> I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)
>
> Regards
> JB
>
> On 01/04/2019 19:36, David Morávek wrote:
> > Hello,
> >
> > is there still a reason to keep Kafka 0.9 support? This unfortunately
> > adds lot of complexity to KafkaIO implementation.
> >
> > Kafka 0.9 was released on Nov 2015.
> >
> > My first shot on removing Kafka 0.9 support would remove second
> > consumer, which is used for fetching offsets.
> >
> > WDYT? Is this support worth keeping?
> >
> > https://github.com/apache/beam/pull/8186
> >
> > D.
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Lukasz Cwik
On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles  wrote:

>
> As to building an aggregated "Java" project, I think the blocker will be
> supporting conflicting deps. For IOs like ElasticSearch and runners like
> Flink the conflict is essential and deliberate, to support multiple
> versions of other services. And that is not even talking about transitive
> dep conflicts. I think Python and Go don't have this issue simply because
> they haven't tackled those problems.
>
> Are you talking about just a shortcut for building (super easy to just add
> since we are using Gradle) or a new artifact that you want to distribute?
>
> On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik  wrote:
>
>> During the gradle migration, we used to have something like:
>>
>> include(":sdks:java:core")
>> include(":sdks:java:extensions:sql")
>> include(":sdks:python")
>>
>> Just to be super clear, this is Gradle default and is equivalent to just
> leaving it blank.
>
>
>> but we discovered the Maven module names that were used during publishing
>> were "core" / "sql" / ... (effectively the directory name) instead of
>> "beam-sdks-java-core".
>>
>
> Isn't this managed by the publication plugin?
> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>  "overriding
> the default identity values is easy: simply specify the groupId, artifactId
> or version attributes when configuring the MavenPublication."
>

During the gradle migration this wasn't that easy. The new maven publish
plugin improved a lot since then.


> Using the default at the time also broke the artifact names for intra
>> project dependencies that we generate[1]. Finally, we also ran into an
>> issue because we had more then one Gradle project with the same directory
>> name even though they were under a different parent folder (I think it was
>> "core") and that was leading to some strange build time behavior.
>>
>
> Weird. But I think the Jira should still stand as a move towards
> simplifying our build and making it more discoverable for new contributors.
>

Agree on the JIRA makes sense, just calling out that there were other
issues that this naming had caused in the past which should be checked
before we call this done.


> Kenn
>
>
>> We didn't migrate to a flat project structure where each project is a
>> folder underneath the root project because of the existing Maven build
>> rules that were being maintained in parallel and I'm not sure if people
>> would want to have a flat project structure either.
>>
>> 1:
>> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055
>>
>> On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey 
>> wrote:
>>
>>> Hi,
>>>
>>> although I did not yet manage to get deeper involved into actual
>>> development, I think this ability would be a useful addition.
>>>
>>> But I would also like to point out, that this is kind of implicit, as
>>> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
>>>
>>> For instance, we would change the current setup from
>>>
>>> include "beam-sdks-java-core"
>>> project(":beam-sdks-java-core").dir = file("sdks/java/core")
>>>
>>> to something like
>>>
>>> include(":sdks:java:core")
>>> include(":sdks:java:extensions:sql")
>>> include(":sdks:python")
>>>
>>>
>>> With this in place a plain
>>>
>>> $ ./gradlew -p sdks/java build
>>>
>>> would exactly do what you want. And, of course, this will also work for
>>> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
>>>
>>> Currently, we deviate from gradle default convention and therefore have
>>> to implement some quirks to restore default behaviour. And I somehow
>>> dislike the structure introduced by parent/child folders, which will be
>>> destroyed by our current project definitions.
>>>
>>> But, to be honest, although I have some clear understanding on how to
>>> proceed here - especially regarding the requirement to keep the change
>>> backwards compatible - we might decide not to switch. Because deeper
>>> investigation might reveal issues, which I am currently not aware of.
>>>
>>> Best,
>>>
>>> michel
>>>
>>> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi guys,

 I would like to introduce a Gradle "meta" project for the build:
 beam-sdks-java.

 The idea is to simply build all Java SDK related resources (core, IO,
 ...).

 The purpose is also to be aligned with the other SDKs which provide
 beam-sdks-go and beam-sdks-python.

 Thoughts ?

 Regards
 JB
 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com

>>>


Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Lukasz Cwik
Yes, need to use the new fields everywhere and then deprecate the old
fields.

On Mon, Apr 1, 2019 at 1:33 PM Kenneth Knowles  wrote:

>
>
> On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik  wrote:
>
>> To clarify, docker isn't the only environment type we are using. We have
>> a process based and "existing" environment mode that don't fit the current
>> protobuf and is being worked around.
>>
>
> Ah, understood.
>
>
>> The idea would be to move to a URN + payload model like our PTransforms
>> and coders with a docker specific one. Using the URN + payload would allow
>> us to have a versioned way to update the environment specifications and
>> deprecate/remove things that are ill defined.
>>
>
> Makes sense to me. It looks like this migration path is already in place
> in `message Environment` in beam_runner_api.proto, with `message
> StandardEnvironments` enumerating some URNs and corresponding payload
> messages just below. So is the gap just getting the two portable runners to
> look at the new fields?
>
> Kenn
>
>
>> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:
>>
>>>
>>>
>>> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:
>>>
 The intention is that these kinds of hints such as CPU and/or memory
 should be embedded in the environment specification that is associated with
 the transforms that need resource hints.

 The environment spec is woefully ill prepared as it only has a docker
 URL right now.

>>>
>>> FWIW I think this is actually "extremely well prepared" :-)
>>>
>>> Protobuf is great for adding fields when you need more but removing is
>>> nearly impossible once deployed, so it is best to do the absolute minimum
>>> until you need to expand.
>>>
>>> Kenn
>>>
>>>

 On Thu, Mar 28, 2019 at 8:45 AM Robert Burke 
 wrote:

> A question came over the beam-go slack that I wasn't able to answer,
> in particular for Dataflow*, is there a way to increase how much of a
> Portable FnAPI worker is dedicated for the SDK side, vs the Runner side?
>
> My assumption is that runners should manage it, and have the Runner
> Harness side be as lightweight as possible, to operate under reasonable
> memory bounds, allowing the user-code more room to spread, since it's
> largely unknown.
>
> I saw there's the Provisioning API
> 
> which to communicates resource limits to the SDK side, but is there a way
> to make the request (probably on job start up) in the other direction?
>
> I imagine it has to do with the container boot code, but I have only
> vague knowledge of how that works at present.
>
> If there's a portable way for it, that's ideal, but I suspect this
> will be require a Dataflow specific answer.
>
> Thanks!
> Robert B
>
> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>



Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Kenneth Knowles
As to building an aggregated "Java" project, I think the blocker will be
supporting conflicting deps. For IOs like ElasticSearch and runners like
Flink the conflict is essential and deliberate, to support multiple
versions of other services. And that is not even talking about transitive
dep conflicts. I think Python and Go don't have this issue simply because
they haven't tackled those problems.

Are you talking about just a shortcut for building (super easy to just add
since we are using Gradle) or a new artifact that you want to distribute?

On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik  wrote:

> During the gradle migration, we used to have something like:
>
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
>
> Just to be super clear, this is Gradle default and is equivalent to just
leaving it blank.


> but we discovered the Maven module names that were used during publishing
> were "core" / "sql" / ... (effectively the directory name) instead of
> "beam-sdks-java-core".
>

Isn't this managed by the publication plugin?
https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
"overriding
the default identity values is easy: simply specify the groupId, artifactId
or version attributes when configuring the MavenPublication."

Using the default at the time also broke the artifact names for intra
> project dependencies that we generate[1]. Finally, we also ran into an
> issue because we had more then one Gradle project with the same directory
> name even though they were under a different parent folder (I think it was
> "core") and that was leading to some strange build time behavior.
>

Weird. But I think the Jira should still stand as a move towards
simplifying our build and making it more discoverable for new contributors.

Kenn


> We didn't migrate to a flat project structure where each project is a
> folder underneath the root project because of the existing Maven build
> rules that were being maintained in parallel and I'm not sure if people
> would want to have a flat project structure either.
>
> 1:
> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055
>
> On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> although I did not yet manage to get deeper involved into actual
>> development, I think this ability would be a useful addition.
>>
>> But I would also like to point out, that this is kind of implicit, as
>> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
>>
>> For instance, we would change the current setup from
>>
>> include "beam-sdks-java-core"
>> project(":beam-sdks-java-core").dir = file("sdks/java/core")
>>
>> to something like
>>
>> include(":sdks:java:core")
>> include(":sdks:java:extensions:sql")
>> include(":sdks:python")
>>
>>
>> With this in place a plain
>>
>> $ ./gradlew -p sdks/java build
>>
>> would exactly do what you want. And, of course, this will also work for
>> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
>>
>> Currently, we deviate from gradle default convention and therefore have
>> to implement some quirks to restore default behaviour. And I somehow
>> dislike the structure introduced by parent/child folders, which will be
>> destroyed by our current project definitions.
>>
>> But, to be honest, although I have some clear understanding on how to
>> proceed here - especially regarding the requirement to keep the change
>> backwards compatible - we might decide not to switch. Because deeper
>> investigation might reveal issues, which I am currently not aware of.
>>
>> Best,
>>
>> michel
>>
>> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I would like to introduce a Gradle "meta" project for the build:
>>> beam-sdks-java.
>>>
>>> The idea is to simply build all Java SDK related resources (core, IO,
>>> ...).
>>>
>>> The purpose is also to be aligned with the other SDKs which provide
>>> beam-sdks-go and beam-sdks-python.
>>>
>>> Thoughts ?
>>>
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>


Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Kenneth Knowles
On Mon, Apr 1, 2019 at 8:59 AM Lukasz Cwik  wrote:

> To clarify, docker isn't the only environment type we are using. We have a
> process based and "existing" environment mode that don't fit the current
> protobuf and is being worked around.
>

Ah, understood.


> The idea would be to move to a URN + payload model like our PTransforms
> and coders with a docker specific one. Using the URN + payload would allow
> us to have a versioned way to update the environment specifications and
> deprecate/remove things that are ill defined.
>

Makes sense to me. It looks like this migration path is already in place in
`message Environment` in beam_runner_api.proto, with `message
StandardEnvironments` enumerating some URNs and corresponding payload
messages just below. So is the gap just getting the two portable runners to
look at the new fields?

Kenn


> On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:
>>
>>> The intention is that these kinds of hints such as CPU and/or memory
>>> should be embedded in the environment specification that is associated with
>>> the transforms that need resource hints.
>>>
>>> The environment spec is woefully ill prepared as it only has a docker
>>> URL right now.
>>>
>>
>> FWIW I think this is actually "extremely well prepared" :-)
>>
>> Protobuf is great for adding fields when you need more but removing is
>> nearly impossible once deployed, so it is best to do the absolute minimum
>> until you need to expand.
>>
>> Kenn
>>
>>
>>>
>>> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke  wrote:
>>>
 A question came over the beam-go slack that I wasn't able to answer, in
 particular for Dataflow*, is there a way to increase how much of a Portable
 FnAPI worker is dedicated for the SDK side, vs the Runner side?

 My assumption is that runners should manage it, and have the Runner
 Harness side be as lightweight as possible, to operate under reasonable
 memory bounds, allowing the user-code more room to spread, since it's
 largely unknown.

 I saw there's the Provisioning API
 
 which to communicates resource limits to the SDK side, but is there a way
 to make the request (probably on job start up) in the other direction?

 I imagine it has to do with the container boot code, but I have only
 vague knowledge of how that works at present.

 If there's a portable way for it, that's ideal, but I suspect this will
 be require a Dataflow specific answer.

 Thanks!
 Robert B

 *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.

>>>


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
Hi,

I mean that I did change on a local branch to be able to do:

./gradlew :beam-sdks-java:build

and/or

./gradlew -p sdks/java build

Regards
JB

On 01/04/2019 19:47, Michael Luckey wrote:
> Hmm... now you lost me :(
> 
> Currently I am not able to do a
> 
> $./gradlew -p sdks/java build
> It fails with error
> 
> Project directory '/Users/michel/GitHub/adude3141/beam/sdks/java' is not
> part of the build defined by settings file
> 
> 
> on my machine, which - again - should be expected.
> 
> Regarding the display, it would look like this if we would be able to switch
> 
> \--- Project ':sdks'
> 
>      +--- Project ':sdks:java'
> 
>      |    +--- Project ':sdks:java:core'- Apache Beam :: SDKs :: Java ::
> Core
> 
>      |    \--- Project ':sdks:java:extensions'
> 
>      |         \--- Project ':sdks:java:extensions:sql'- Apache Beam ::
> SDKs :: Java :: Extensions :: SQL
> 
>      \--- Project ':sdks:python'
> 
> 
> 
> On Mon, Apr 1, 2019 at 7:36 PM Jean-Baptiste Onofré  > wrote:
> 
> By the way, another reason is to have this clearly displayed in
> ./gradlew projects ;)
> 
> On 01/04/2019 18:49, Michael Luckey wrote:
> > Hi,
> >
> > although I did not yet manage to get deeper involved into actual
> > development, I think this ability would be a useful addition.
> >
> > But I would also like to point out, that this is kind of implicit, as
> > soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> >
> > For instance, we would change the current setup from
> >
> > include "beam-sdks-java-core"
> > project(":beam-sdks-java-core").dir = file("sdks/java/core")
> >
> > to something like
> >
> > include(":sdks:java:core")
> > include(":sdks:java:extensions:sql")
> > include(":sdks:python")
> >
> >
> > With this in place a plain
> >
> > $ ./gradlew -p sdks/java build
> >
> >
> > would exactly do what you want. And, of course, this will also
> work for
> > 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> >
> > Currently, we deviate from gradle default convention and therefore
> have
> > to implement some quirks to restore default behaviour. And I somehow
> > dislike the structure introduced by parent/child folders, which
> will be
> > destroyed by our current project definitions.
> >
> > But, to be honest, although I have some clear understanding on how to
> > proceed here - especially regarding the requirement to keep the change
> > backwards compatible - we might decide not to switch. Because deeper
> > investigation might reveal issues, which I am currently not aware of.
> >
> > Best,
> >
> > michel
> >
> > On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>
> > >> wrote:
> >
> >     Hi guys,
> >
> >     I would like to introduce a Gradle "meta" project for the build:
> >     beam-sdks-java.
> >
> >     The idea is to simply build all Java SDK related resources (core,
> >     IO, ...).
> >
> >     The purpose is also to be aligned with the other SDKs which
> provide
> >     beam-sdks-go and beam-sdks-python.
> >
> >     Thoughts ?
> >
> >     Regards
> >     JB
> >     --
> >     Jean-Baptiste Onofré
> >     jbono...@apache.org 
> >
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Kenneth Knowles
+1 thanks for noticing and raising yet another source of non-hermeticity
(plus the docker constraint)

On Mon, Apr 1, 2019 at 9:09 AM Andrew Pilloud  wrote:

> +1 on this, particularly removing the dead link checker from default
> tests. It is effectively testing that ~20 random websites are up. I wonder
> if there is a way to limit it to locally testing links within the beam site?
>
> On Mon, Apr 1, 2019 at 3:54 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> after playing around with Gradle build for a while, I would like to
>> suggest to remove ':beam-website:testWebsite target from Gradle's check
>> task.
>>
>> Rationale:
>> - the task seems to be very flaky. In fact, I always need to add '-x
>> :beam-website:testWebsite' to my build [1]
>> - task uses docker, which imho adds a (unnecessary) severe constraint on
>> the build task. E.g. A part time user is unable to execute these tests in a
>> docker environment
>> - these tests are accessing production environment. So myself hitting the
>> build several times an hour could be considered a DOS attack.
>>
>> Of course, these tests add lots of value and should definitely be
>> executed, but wouldn't it be sufficient, to run this task only dedicated,
>> i.e. by an explicit call to ':beam-website:testWebsite' o
>> ':websitePreCommit'? Any thoughts?
>>
>> best,
>>
>> michel
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6760
>>
>


Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Alan Myrvold
+1 if possible, removing link checks would be nice too, if they are
unreliable and there is a way to disable them.

On Mon, Apr 1, 2019 at 10:33 AM Mikhail Gryzykhin  wrote:

> +1 on this. I'd prefer to have this as pre-commit only.
>
> On Mon, Apr 1, 2019 at 9:09 AM Andrew Pilloud  wrote:
>
>> +1 on this, particularly removing the dead link checker from default
>> tests. It is effectively testing that ~20 random websites are up. I wonder
>> if there is a way to limit it to locally testing links within the beam site?
>>
>> On Mon, Apr 1, 2019 at 3:54 AM Michael Luckey 
>> wrote:
>>
>>> Hi,
>>>
>>> after playing around with Gradle build for a while, I would like to
>>> suggest to remove ':beam-website:testWebsite target from Gradle's check
>>> task.
>>>
>>> Rationale:
>>> - the task seems to be very flaky. In fact, I always need to add '-x
>>> :beam-website:testWebsite' to my build [1]
>>> - task uses docker, which imho adds a (unnecessary) severe constraint on
>>> the build task. E.g. A part time user is unable to execute these tests in a
>>> docker environment
>>> - these tests are accessing production environment. So myself hitting
>>> the build several times an hour could be considered a DOS attack.
>>>
>>> Of course, these tests add lots of value and should definitely be
>>> executed, but wouldn't it be sufficient, to run this task only dedicated,
>>> i.e. by an explicit call to ':beam-website:testWebsite' o
>>> ':websitePreCommit'? Any thoughts?
>>>
>>> best,
>>>
>>> michel
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6760
>>>
>>


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Michael Luckey
Hmm... now you lost me :(

Currently I am not able to do a

$./gradlew -p sdks/java build
It fails with error

Project directory '/Users/michel/GitHub/adude3141/beam/sdks/java' is not
part of the build defined by settings file

on my machine, which - again - should be expected.

Regarding the display, it would look like this if we would be able to switch

\--- Project ':sdks'

 +--- Project ':sdks:java'

 |+--- Project ':sdks:java:core' - Apache Beam :: SDKs :: Java ::
Core

 |\--- Project ':sdks:java:extensions'

 | \--- Project ':sdks:java:extensions:sql' - Apache Beam ::
SDKs :: Java :: Extensions :: SQL

 \--- Project ':sdks:python'


On Mon, Apr 1, 2019 at 7:36 PM Jean-Baptiste Onofré  wrote:

> By the way, another reason is to have this clearly displayed in
> ./gradlew projects ;)
>
> On 01/04/2019 18:49, Michael Luckey wrote:
> > Hi,
> >
> > although I did not yet manage to get deeper involved into actual
> > development, I think this ability would be a useful addition.
> >
> > But I would also like to point out, that this is kind of implicit, as
> > soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> >
> > For instance, we would change the current setup from
> >
> > include "beam-sdks-java-core"
> > project(":beam-sdks-java-core").dir = file("sdks/java/core")
> >
> > to something like
> >
> > include(":sdks:java:core")
> > include(":sdks:java:extensions:sql")
> > include(":sdks:python")
> >
> >
> > With this in place a plain
> >
> > $ ./gradlew -p sdks/java build
> >
> >
> > would exactly do what you want. And, of course, this will also work for
> > 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> >
> > Currently, we deviate from gradle default convention and therefore have
> > to implement some quirks to restore default behaviour. And I somehow
> > dislike the structure introduced by parent/child folders, which will be
> > destroyed by our current project definitions.
> >
> > But, to be honest, although I have some clear understanding on how to
> > proceed here - especially regarding the requirement to keep the change
> > backwards compatible - we might decide not to switch. Because deeper
> > investigation might reveal issues, which I am currently not aware of.
> >
> > Best,
> >
> > michel
> >
> > On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré  > > wrote:
> >
> > Hi guys,
> >
> > I would like to introduce a Gradle "meta" project for the build:
> > beam-sdks-java.
> >
> > The idea is to simply build all Java SDK related resources (core,
> > IO, ...).
> >
> > The purpose is also to be aligned with the other SDKs which provide
> > beam-sdks-go and beam-sdks-python.
> >
> > Thoughts ?
> >
> > Regards
> > JB
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: kafka 0.9 support

2019-04-01 Thread Jean-Baptiste Onofré
+1 to remove 0.9 support.

I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)

Regards
JB

On 01/04/2019 19:36, David Morávek wrote:
> Hello,
> 
> is there still a reason to keep Kafka 0.9 support? This unfortunately
> adds lot of complexity to KafkaIO implementation.
> 
> Kafka 0.9 was released on Nov 2015.
> 
> My first shot on removing Kafka 0.9 support would remove second
> consumer, which is used for fetching offsets.
> 
> WDYT? Is this support worth keeping?
> 
> https://github.com/apache/beam/pull/8186
> 
> D.

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


kafka 0.9 support

2019-04-01 Thread David Morávek
Hello,

is there still a reason to keep Kafka 0.9 support? This unfortunately adds
lot of complexity to KafkaIO implementation.

Kafka 0.9 was released on Nov 2015.

My first shot on removing Kafka 0.9 support would remove second consumer,
which is used for fetching offsets.

WDYT? Is this support worth keeping?

https://github.com/apache/beam/pull/8186

D.


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
By the way, another reason is to have this clearly displayed in
./gradlew projects ;)

On 01/04/2019 18:49, Michael Luckey wrote:
> Hi,
> 
> although I did not yet manage to get deeper involved into actual
> development, I think this ability would be a useful addition.
> 
> But I would also like to point out, that this is kind of implicit, as
> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> 
> For instance, we would change the current setup from
> 
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = file("sdks/java/core")
> 
> to something like
> 
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
> 
> 
> With this in place a plain
> 
> $ ./gradlew -p sdks/java build
> 
> 
> would exactly do what you want. And, of course, this will also work for
> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> 
> Currently, we deviate from gradle default convention and therefore have
> to implement some quirks to restore default behaviour. And I somehow
> dislike the structure introduced by parent/child folders, which will be
> destroyed by our current project definitions.
> 
> But, to be honest, although I have some clear understanding on how to
> proceed here - especially regarding the requirement to keep the change
> backwards compatible - we might decide not to switch. Because deeper
> investigation might reveal issues, which I am currently not aware of.
> 
> Best,
> 
> michel
> 
> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré  > wrote:
> 
> Hi guys,
> 
> I would like to introduce a Gradle "meta" project for the build:
> beam-sdks-java.
> 
> The idea is to simply build all Java SDK related resources (core,
> IO, ...).
> 
> The purpose is also to be aligned with the other SDKs which provide
> beam-sdks-go and beam-sdks-python.
> 
> Thoughts ?
> 
> Regards
> JB
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
Hi Michael,

Yes, I know the -p option and it's currently what I'm using.

However the proposal is also in order to have some more "consistent"
with other modules.

Regards
JB

On 01/04/2019 18:49, Michael Luckey wrote:
> Hi,
> 
> although I did not yet manage to get deeper involved into actual
> development, I think this ability would be a useful addition.
> 
> But I would also like to point out, that this is kind of implicit, as
> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> 
> For instance, we would change the current setup from
> 
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = file("sdks/java/core")
> 
> to something like
> 
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
> 
> 
> With this in place a plain
> 
> $ ./gradlew -p sdks/java build
> 
> 
> would exactly do what you want. And, of course, this will also work for
> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> 
> Currently, we deviate from gradle default convention and therefore have
> to implement some quirks to restore default behaviour. And I somehow
> dislike the structure introduced by parent/child folders, which will be
> destroyed by our current project definitions.
> 
> But, to be honest, although I have some clear understanding on how to
> proceed here - especially regarding the requirement to keep the change
> backwards compatible - we might decide not to switch. Because deeper
> investigation might reveal issues, which I am currently not aware of.
> 
> Best,
> 
> michel
> 
> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré  > wrote:
> 
> Hi guys,
> 
> I would like to introduce a Gradle "meta" project for the build:
> beam-sdks-java.
> 
> The idea is to simply build all Java SDK related resources (core,
> IO, ...).
> 
> The purpose is also to be aligned with the other SDKs which provide
> beam-sdks-go and beam-sdks-python.
> 
> Thoughts ?
> 
> Regards
> JB
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Mikhail Gryzykhin
+1 on this. I'd prefer to have this as pre-commit only.

On Mon, Apr 1, 2019 at 9:09 AM Andrew Pilloud  wrote:

> +1 on this, particularly removing the dead link checker from default
> tests. It is effectively testing that ~20 random websites are up. I wonder
> if there is a way to limit it to locally testing links within the beam site?
>
> On Mon, Apr 1, 2019 at 3:54 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> after playing around with Gradle build for a while, I would like to
>> suggest to remove ':beam-website:testWebsite target from Gradle's check
>> task.
>>
>> Rationale:
>> - the task seems to be very flaky. In fact, I always need to add '-x
>> :beam-website:testWebsite' to my build [1]
>> - task uses docker, which imho adds a (unnecessary) severe constraint on
>> the build task. E.g. A part time user is unable to execute these tests in a
>> docker environment
>> - these tests are accessing production environment. So myself hitting the
>> build several times an hour could be considered a DOS attack.
>>
>> Of course, these tests add lots of value and should definitely be
>> executed, but wouldn't it be sufficient, to run this task only dedicated,
>> i.e. by an explicit call to ':beam-website:testWebsite' o
>> ':websitePreCommit'? Any thoughts?
>>
>> best,
>>
>> michel
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6760
>>
>


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Lukasz Cwik
During the gradle migration, we used to have something like:

include(":sdks:java:core")
include(":sdks:java:extensions:sql")
include(":sdks:python")

but we discovered the Maven module names that were used during publishing
were "core" / "sql" / ... (effectively the directory name) instead of
"beam-sdks-java-core". Using the default at the time also broke the
artifact names for intra project dependencies that we generate[1]. Finally,
we also ran into an issue because we had more then one Gradle project with
the same directory name even though they were under a different parent
folder (I think it was "core") and that was leading to some strange build
time behavior.

We didn't migrate to a flat project structure where each project is a
folder underneath the root project because of the existing Maven build
rules that were being maintained in parallel and I'm not sure if people
would want to have a flat project structure either.

1:
https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055

On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey  wrote:

> Hi,
>
> although I did not yet manage to get deeper involved into actual
> development, I think this ability would be a useful addition.
>
> But I would also like to point out, that this is kind of implicit, as soon
> we get https://issues.apache.org/jira/browse/BEAM-4046 included.
>
> For instance, we would change the current setup from
>
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = file("sdks/java/core")
>
> to something like
>
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
>
>
> With this in place a plain
>
> $ ./gradlew -p sdks/java build
>
> would exactly do what you want. And, of course, this will also work for
> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
>
> Currently, we deviate from gradle default convention and therefore have to
> implement some quirks to restore default behaviour. And I somehow dislike
> the structure introduced by parent/child folders, which will be destroyed
> by our current project definitions.
>
> But, to be honest, although I have some clear understanding on how to
> proceed here - especially regarding the requirement to keep the change
> backwards compatible - we might decide not to switch. Because deeper
> investigation might reveal issues, which I am currently not aware of.
>
> Best,
>
> michel
>
> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi guys,
>>
>> I would like to introduce a Gradle "meta" project for the build:
>> beam-sdks-java.
>>
>> The idea is to simply build all Java SDK related resources (core, IO,
>> ...).
>>
>> The purpose is also to be aligned with the other SDKs which provide
>> beam-sdks-go and beam-sdks-python.
>>
>> Thoughts ?
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Michael Luckey
Hi,

although I did not yet manage to get deeper involved into actual
development, I think this ability would be a useful addition.

But I would also like to point out, that this is kind of implicit, as soon
we get https://issues.apache.org/jira/browse/BEAM-4046 included.

For instance, we would change the current setup from

include "beam-sdks-java-core"
project(":beam-sdks-java-core").dir = file("sdks/java/core")

to something like

include(":sdks:java:core")
include(":sdks:java:extensions:sql")
include(":sdks:python")


With this in place a plain

$ ./gradlew -p sdks/java build

would exactly do what you want. And, of course, this will also work for
'sdks/java/io', 'runners/' etc. Hope, you get the point.

Currently, we deviate from gradle default convention and therefore have to
implement some quirks to restore default behaviour. And I somehow dislike
the structure introduced by parent/child folders, which will be destroyed
by our current project definitions.

But, to be honest, although I have some clear understanding on how to
proceed here - especially regarding the requirement to keep the change
backwards compatible - we might decide not to switch. Because deeper
investigation might reveal issues, which I am currently not aware of.

Best,

michel

On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré  wrote:

> Hi guys,
>
> I would like to introduce a Gradle "meta" project for the build:
> beam-sdks-java.
>
> The idea is to simply build all Java SDK related resources (core, IO, ...).
>
> The purpose is also to be aligned with the other SDKs which provide
> beam-sdks-go and beam-sdks-python.
>
> Thoughts ?
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Contibutor permissions for Beam Jira tickets

2019-04-01 Thread Kenneth Knowles
Welcome!

On Mon, Apr 1, 2019 at 9:22 AM Ahmet Altay  wrote:

> Welcome to the project!
>
> On Mon, Apr 1, 2019 at 6:23 AM Ismaël Mejía  wrote:
>
>> You have now the Contributor role, and I assigned the ticket you asked
>> for.
>> Enjoy!
>>
>> Ismaël
>>
>> On Mon, Apr 1, 2019 at 12:35 PM Madhusudhan Reddy Vennapusa
>>  wrote:
>> >
>> > Hi,
>> >
>> > This is Madhu, I am interested to contribute to Apache Beam.
>> >
>> > Can someone please add me as contributor, I would like to assign tasks
>> which i can work upon. My jira username is sudhan499
>> >
>> > Also I am interested to start with starter task(
>> https://issues.apache.org/jira/browse/BEAM-3344). Can I assign it to
>> myself?
>> >
>> > Thanks & Regards,
>> > Madhu
>>
>


Re: Contibutor permissions for Beam Jira tickets

2019-04-01 Thread Ahmet Altay
Welcome to the project!

On Mon, Apr 1, 2019 at 6:23 AM Ismaël Mejía  wrote:

> You have now the Contributor role, and I assigned the ticket you asked for.
> Enjoy!
>
> Ismaël
>
> On Mon, Apr 1, 2019 at 12:35 PM Madhusudhan Reddy Vennapusa
>  wrote:
> >
> > Hi,
> >
> > This is Madhu, I am interested to contribute to Apache Beam.
> >
> > Can someone please add me as contributor, I would like to assign tasks
> which i can work upon. My jira username is sudhan499
> >
> > Also I am interested to start with starter task(
> https://issues.apache.org/jira/browse/BEAM-3344). Can I assign it to
> myself?
> >
> > Thanks & Regards,
> > Madhu
>


Re: Quieten javadoc generation

2019-04-01 Thread Kenneth Knowles
Personally, I would like to suppress the warnings globally. I think
requiring javadoc everywhere is already enough to remind someone to write
something meaningful. And I think @param rarely adds anything beyond the
function signature and @return rarely adds anything beyond the description.

Kenn

On Mon, Apr 1, 2019 at 6:53 AM Michael Luckey  wrote:

> Hi,
>
> currently our console output gets cluttered by thousands of Javadoc
> warnings [1]. Most of them are warnings caused by missinlng @return
> or @param tags  [2].
>
> So currently, this signal is completely ignored, and even worse, makes it
> difficult to parse through the log.
>
> As I could not find a previous discussion on the list on how to handle
> param/return on java docs, I felt the need to ask here first, how we would
> like to improve this situation.
>
> Some options
> 1. fix those warnings
> 2. do not insist on those tags being present and disable doclint warnings
> (probably not doable on tag granularity). This is already done on doc
> aggregation task [3]
>
> Thoughts?
>
>
> [1] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/console
> [2] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/java/
> [3]
> https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle#L77-L78
>
>


Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Andrew Pilloud
+1 on this, particularly removing the dead link checker from default tests.
It is effectively testing that ~20 random websites are up. I wonder if
there is a way to limit it to locally testing links within the beam site?

On Mon, Apr 1, 2019 at 3:54 AM Michael Luckey  wrote:

> Hi,
>
> after playing around with Gradle build for a while, I would like to
> suggest to remove ':beam-website:testWebsite target from Gradle's check
> task.
>
> Rationale:
> - the task seems to be very flaky. In fact, I always need to add '-x
> :beam-website:testWebsite' to my build [1]
> - task uses docker, which imho adds a (unnecessary) severe constraint on
> the build task. E.g. A part time user is unable to execute these tests in a
> docker environment
> - these tests are accessing production environment. So myself hitting the
> build several times an hour could be considered a DOS attack.
>
> Of course, these tests add lots of value and should definitely be
> executed, but wouldn't it be sufficient, to run this task only dedicated,
> i.e. by an explicit call to ':beam-website:testWebsite' o
> ':websitePreCommit'? Any thoughts?
>
> best,
>
> michel
>
> [1] https://issues.apache.org/jira/browse/BEAM-6760
>


Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Lukasz Cwik
To clarify, docker isn't the only environment type we are using. We have a
process based and "existing" environment mode that don't fit the current
protobuf and is being worked around.

The idea would be to move to a URN + payload model like our PTransforms and
coders with a docker specific one. Using the URN + payload would allow us
to have a versioned way to update the environment specifications and
deprecate/remove things that are ill defined.

On Fri, Mar 29, 2019 at 6:41 PM Kenneth Knowles  wrote:

>
>
> On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik  wrote:
>
>> The intention is that these kinds of hints such as CPU and/or memory
>> should be embedded in the environment specification that is associated with
>> the transforms that need resource hints.
>>
>> The environment spec is woefully ill prepared as it only has a docker URL
>> right now.
>>
>
> FWIW I think this is actually "extremely well prepared" :-)
>
> Protobuf is great for adding fields when you need more but removing is
> nearly impossible once deployed, so it is best to do the absolute minimum
> until you need to expand.
>
> Kenn
>
>
>>
>> On Thu, Mar 28, 2019 at 8:45 AM Robert Burke  wrote:
>>
>>> A question came over the beam-go slack that I wasn't able to answer, in
>>> particular for Dataflow*, is there a way to increase how much of a Portable
>>> FnAPI worker is dedicated for the SDK side, vs the Runner side?
>>>
>>> My assumption is that runners should manage it, and have the Runner
>>> Harness side be as lightweight as possible, to operate under reasonable
>>> memory bounds, allowing the user-code more room to spread, since it's
>>> largely unknown.
>>>
>>> I saw there's the Provisioning API
>>> 
>>> which to communicates resource limits to the SDK side, but is there a way
>>> to make the request (probably on job start up) in the other direction?
>>>
>>> I imagine it has to do with the container boot code, but I have only
>>> vague knowledge of how that works at present.
>>>
>>> If there's a portable way for it, that's ideal, but I suspect this will
>>> be require a Dataflow specific answer.
>>>
>>> Thanks!
>>> Robert B
>>>
>>> *Dataflow doesn't support the Go SDK, but the Go SDK supports Dataflow.
>>>
>>


[PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
Hi guys,

I would like to introduce a Gradle "meta" project for the build:
beam-sdks-java.

The idea is to simply build all Java SDK related resources (core, IO, ...).

The purpose is also to be aligned with the other SDKs which provide
beam-sdks-go and beam-sdks-python.

Thoughts ?

Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Quieten javadoc generation

2019-04-01 Thread Michael Luckey
Hi,

currently our console output gets cluttered by thousands of Javadoc
warnings [1]. Most of them are warnings caused by missinlng @return
or @param tags  [2].

So currently, this signal is completely ignored, and even worse, makes it
difficult to parse through the log.

As I could not find a previous discussion on the list on how to handle
param/return on java docs, I felt the need to ask here first, how we would
like to improve this situation.

Some options
1. fix those warnings
2. do not insist on those tags being present and disable doclint warnings
(probably not doable on tag granularity). This is already done on doc
aggregation task [3]

Thoughts?


[1] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/console
[2] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/java/
[3]
https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle#L77-L78


Re: Contibutor permissions for Beam Jira tickets

2019-04-01 Thread Ismaël Mejía
You have now the Contributor role, and I assigned the ticket you asked for.
Enjoy!

Ismaël

On Mon, Apr 1, 2019 at 12:35 PM Madhusudhan Reddy Vennapusa
 wrote:
>
> Hi,
>
> This is Madhu, I am interested to contribute to Apache Beam.
>
> Can someone please add me as contributor, I would like to assign tasks which 
> i can work upon. My jira username is sudhan499
>
> Also I am interested to start with starter 
> task(https://issues.apache.org/jira/browse/BEAM-3344). Can I assign it to 
> myself?
>
> Thanks & Regards,
> Madhu


Beam Dependency Check Report (2019-04-01)

2019-04-01 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
future
0.16.0
0.17.1
2016-10-27
2018-12-10BEAM-5968
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.rabbitmq:amqp-client
4.6.0
5.6.0
2018-03-26
2019-01-25BEAM-5895
com.google.auto.service:auto-service
1.0-rc2
1.0-rc5
2014-10-25
2019-03-25BEAM-5541
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.17.0
0.21.0
2019-02-11
2019-03-04BEAM-6645
org.conscrypt:conscrypt-openjdk
1.1.3
2.0.0
2018-06-04
2019-02-13BEAM-5748
org.elasticsearch:elasticsearch
6.4.0
7.0.0-rc1
2018-08-18
2019-03-22BEAM-6090
org.elasticsearch:elasticsearch-hadoop
5.0.0
7.0.0-rc1
2016-10-26
2019-03-22BEAM-5551
org.elasticsearch.client:elasticsearch-rest-client
6.4.0
7.0.0-rc1
2018-08-18
2019-03-22BEAM-6091
com.google.errorprone:error_prone_annotations
2.1.2
2.3.3
2017-10-19
2019-02-22BEAM-6741
org.elasticsearch.test:framework
6.4.0
7.0.0-rc1
2018-08-18
2019-03-22BEAM-6092
com.google.auth:google-auth-library-credentials
0.12.0
0.15.0
2018-11-14
2019-03-27BEAM-6478
io.grpc:grpc-context
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5897
io.grpc:grpc-protobuf
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5900
io.grpc:grpc-testing
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5902
com.google.code.gson:gson
2.7
2.8.5
2016-06-14
2018-05-22BEAM-5558
org.apache.hbase:hbase-common
1.2.6
2.1.4
2017-05-29
2019-03-20BEAM-5560
org.apache.hbase:hbase-hadoop-compat
1.2.6
2.1.4
2017-05-29
2019-03-20BEAM-5561
org.apache.hbase:hbase-hadoop2-compat
1.2.6
2.1.4
2017-05-29
2019-03-20BEAM-5562
org.apache.hbase:hbase-server
1.2.6
2.1.4
2017-05-29
2019-03-20BEAM-5563
org.apache.hbase:hbase-shaded-client
1.2.6
2.1.4
2017-05-29
2019-03-20BEAM-5564
org.apache.hive:hive-cli
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5566
org.apache.hive:hive-common
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5567
org.apache.hive:hive-exec
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5568
org.apache.hive.hcatalog:hive-hcatalog-core
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5569
net.java.dev.javacc:javacc
4.0
7.0.4
2006-03-17
2018-09-17BEAM-5570
javax.servlet:javax.servlet-api
3.1.0
4.0.1
2013-04-25
2018-04-20BEAM-5750
org.eclipse.jetty:jetty-server
9.2.10.v20150310
9.4.15.v20190215
2015-03-10
2019-02-15BEAM-5752
org.eclipse.jetty:jetty-servlet
9.2.10.v20150310
9.4.15.v20190215
2015-03-10
2019-02-15BEAM-5753
net.java.dev.jna:jna
4.1.0
5.2.0
2014-03-06
2018-12-23BEAM-5573
junit:junit
4.13-beta-1
4.13-beta-2
2018-11-25
2019-02-02BEAM-6127
com.esotericsoftware:kryo
4.0.2
5.0.0-RC2
2018-03-20
2019-02-05BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
org.apache.kudu:kudu-client
1.4.0
1.9.0
2017-06-05
2019-02-27BEAM-5575
org.fusesource.mqtt-client:mqtt-client
1.14
1.15
2016-05-31
2019-03-11BEAM-6801
io.netty:netty-tcnative-boringssl-static
2.0.8.Final
2.0.23.Final
2018-03-27
2019-03-23BEAM-6897
com.google.api.grpc:proto-google-common-protos
1.12.0
1.15.0
2018-06-29
2019-03-20BEAM-6899
io.grpc:protoc-gen-grpc-java
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5903
org.apache.qpid:proton-j
0.13.1

Re: Build blocking on

2019-04-01 Thread Michael Luckey
Do not worry. It seems next to impossible to stumble upon this issue if
tests developed on 'prepared' machine. This seems to be a side effect of a
used library which seems difficult to expect.

Thanks for looking into that. FWIW I opened an issue and assigned it to you
[1]. Feel free to reach out to me if any information is missing.

best,

michel

[1] https://issues.apache.org/jira/browse/BEAM-6949

On 2019/04/01 00:53:40, Pablo Estrada  wrote:
> Hi Michael,>
> I wrote that test and much of that code. I'm quite sorry about the
trouble.>
> The test should use mocks and not hang when it's missing GCP
dependencies.>
> That sounds like a bug in the test. We can deactivate it while I figure
out>
> what's going wrong..>
> Best>
> -P.>
>
> On Sat, Mar 30, 2019, 2:55 PM Michael Luckey  wrote:>
>
> > After digging a bit deeper, I was able to verify, that those tests
block>
> > on authorization to GCP.>
> >>
> > Seems that, as I do not have any credentials set, and underlying
oauth2>
> > falls back to some local mode. This seems to start a webserver on port
8080>
> > and waiting there forever. Accessing that port forwards to some google,
but>
> > fails also miserably.>
> >>
> > Running>
> >>
> > python setup.py nosetests --tests>
> >>
 
apache_beam.io.gcp.bigquery_file_loads_test:TestBigQueryFileLoads.test_records_traverse_transform_with_mocks>
> >>
> >>
> > and hitting 'Ctrl-C' after it got stuck, results in following output:>
> >>
> > 'KeyboardInterrupt [while running>
> >>
\'WriteToBigQuery/BigQueryBatchFileLoads/RemoveTempTables/Delete\']\n>
> >> Your browser has been opened to visit:>
> >>>
> >>
https://accounts.google.com/o/oauth2/v2/auth?scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery+https%3A%
>
> >> If your browser is on a different machine then exit and re-run this>
> >> application with the command-line parameter>
> >>   --noauth_local_webserver>
> >> Failed to find "code" in the query parameters of the redirect.>
> >> Invalid authorization: Try running with --noauth_local_webserver.>
> >>
> >>
> > I am a bit lost here on how to proceed.>
> >>
> >>
> > On Tue, Mar 26, 2019 at 11:48 PM Michael Luckey >
> > wrote:>
> >>
> >>>
> >>>
> >> On Tue, Mar 26, 2019 at 11:18 PM Mikhail Gryzykhin >
> >> wrote:>
> >>>
> >>> I believe what happens is that testPy2Gcp actually runs integration>
> >>> tests that try to connect to GCP.>
> 
> >>>
> >> Actually I was hoping for an explanation like this. Any suggestion how
I>
> >> could confirm that on my behalf?>
> >>>
> >>>
> >>> Without having GCP cluster and configuration on your machine I'd
expect>
> >>> these tests to fail.>
> 
> >>>
> >> Hmm... here I am actually unsure, what would be the best to handle
such>
> >> cases.>
> >>>
> >> If I understand correctly, we currently skip some tests which do not
meet>
> >> expectations, kind of 'can not run on your arch' thingies... So I am>
> >> undecided, whether I d prefer those tests to be skipped if gcp>
> >> configuration is missing>
> >>>
> >> pro>
> >> * dev is still able to run the tests (whichever task they are
associated>
> >> with) without having to separate the failures out. For instance,
these>
> >> 'testPy2Gcp' does actually execute 'some tests' - which might be
already>
> >> covered by some other calls... But I definitely do not like the idea,
to>
> >> put the burden on the developer to track which tasks/tests might be>
> >> executed on local machine. Unless this distinction is really coarse -
and>
> >> pre/postcommit is something I really would like to be able to run
locally...>
> >>>
> >>>
> >> con>
> >> * we definitely need to make sure, those tests are not accidentally>
> >> skipped on CI servers.>
> >>>
> >>>
> 
> >>> I'd say we should remove testPy2Gcp task from "build" task and>
> >>> explicitly keep it as integration test.>
> 
> >>> --Mikhail>
> 
> 
> >>> On Tue, Mar 26, 2019 at 3:12 PM Michael Luckey >
> >>> wrote:>
> 
> >
> >
>  On Tue, Mar 26, 2019 at 10:29 PM Udi Meiri 
wrote:>
> >
> > Luckey, I couldn't recreate your issue, but I still haven't done a>
> > full build.>
> > I created a new GCE VM with using the
ubuntu-1804-bionic-v20190212a>
> > image (n1-standard-4 machine type).>
> >>
> > Ran the following:>
> > sudo apt-get update>
> > sudo apt-get install python-pip>
> > sudo apt-get install python-virtualenv>
> > git clone https://github.com/apache/beam.git>
> > cd beam>
> > ./gradlew :beam-sdks-python:testPy2Gcp>
> > [failed: no JAVA_HOME]>
> > sudo apt-get install openjdk-8-jdk>
> > ./gradlew :beam-sdks-python:testPy2Gcp>
> >>
> > Got: BUILD SUCCESSFUL in 7m 52s>
> >>
> >
>  Nice. Thanks a lot for your help here.>
> >
>  If I understand correctly, this VM is already located within gcp.
Could>
>  it already have some setup, which needs to be done on 'my' VM? For
instance>
>  I was contemplating about that test trying 'to call home', but 

Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Michael Luckey
Hi,

after playing around with Gradle build for a while, I would like to suggest
to remove ':beam-website:testWebsite target from Gradle's check task.

Rationale:
- the task seems to be very flaky. In fact, I always need to add '-x
:beam-website:testWebsite' to my build [1]
- task uses docker, which imho adds a (unnecessary) severe constraint on
the build task. E.g. A part time user is unable to execute these tests in a
docker environment
- these tests are accessing production environment. So myself hitting the
build several times an hour could be considered a DOS attack.

Of course, these tests add lots of value and should definitely be executed,
but wouldn't it be sufficient, to run this task only dedicated, i.e. by an
explicit call to ':beam-website:testWebsite' o ':websitePreCommit'? Any
thoughts?

best,

michel

[1] https://issues.apache.org/jira/browse/BEAM-6760


Contibutor permissions for Beam Jira tickets

2019-04-01 Thread Madhusudhan Reddy Vennapusa
Hi,

This is Madhu, I am interested to contribute to Apache Beam.

Can someone please add me as contributor, I would like to assign tasks
which i can work upon. My jira username is *sudhan499*

Also I am interested to start with starter task(
https://issues.apache.org/jira/browse/BEAM-3344). Can I assign it to myself?

Thanks & Regards,
Madhu