Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-02 Thread Michael Luckey
Hi JB,

if I understood correctly, you already have a living branch with the
required changes. I think it will helpful if you were able to share that to
support your proposal.

Best,

michel


On Tue, Apr 2, 2019 at 5:02 PM Jean-Baptiste Onofré  wrote:

> Hi Michael,
>
> no problem for the thread, that's the goal of the mailing list ;)
>
> And yes, you got my idea about a "meta" module: easy way of building the
> "whole" Java SDK.
>
> The purpose is not to create a uber jar, but more to simplify the build
> for Java SDK developers.
>
> Do you want me to complete your PR with what I did ?
>
> Regards
> JB
>
> On 02/04/2019 16:49, Michael Luckey wrote:
> > Going to fork the BEAM-4046 discussion. And, JB, I apologise for
> > hijacking your thread.
> >
> > As for the original question, I understood a request for a meta project
> > which will enable easier handling of java projects. E.g. instead of
> > requiring the user to call
> >
> > ./gradlew module1:build module2:build ... moduleN.build
> >
> > a meta project with build task defined something about
> >
> > build.dependsOn module1:build
> > build.dependsOn module2:build
> > ...
> > build.dependsOn moduleN:build
> >
> > And other task as found usable.
> >
> > Not a project which in itself creates some uberjar, which I also believe
> > would rather difficult to implement.
> >
> > On Tue, Apr 2, 2019 at 5:13 AM Kenneth Knowles  > > wrote:
> >
> > Oh, yikes. It seems
> > like https://github.com/gradle/gradle/issues/847 indicates that the
> > feature to use the default names in Gradle is practically
> > nonfunctional. If that bug is as severe as it looks, I have to
> > retract my position. Like we could never have sdks/java/core and
> > sdks/py/core, right?
> >
> > Kenn
> >
> > On Mon, Apr 1, 2019 at 6:27 PM Michael Luckey  > > wrote:
> >
> > FWIW, hacked something as showcase for BEAM-4046 [1]
> >
> > This is miserably broken, but a
> >
> > ./gradlew projects
> >
> > or
> >
> > ./gradlew -p sdks/java build
> >
> > should work. Anything else is likely to cause issues. If u hit
> > stack overflow exception, it's likely caused
> > by https://github.com/gradle/gradle/issues/847
> >
> > To continue here, lots of cleanup has to be done. We might also
> > need to rename folders etc, do better reflect semantic
> intentions.
> >
> > [1] https://github.com/apache/beam/pull/8194
> >
> > On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles  > > wrote:
> >
> >
> >
> > On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  > > wrote:
> >
> >
> >
> > On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles
> > mailto:k...@apache.org>> wrote:
> >
> >
> > As to building an aggregated "Java" project, I think
> > the blocker will be supporting conflicting deps. For
> > IOs like ElasticSearch and runners like Flink the
> > conflict is essential and deliberate, to support
> > multiple versions of other services. And that is not
> > even talking about transitive dep conflicts. I think
> > Python and Go don't have this issue simply because
> > they haven't tackled those problems.
> >
> > Are you talking about just a shortcut for building
> > (super easy to just add since we are using Gradle)
> > or a new artifact that you want to distribute?
> >
> > On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik
> > mailto:lc...@google.com>> wrote:
> >
> > During the gradle migration, we used to have
> > something like:
> >
> > include(":sdks:java:core")
> > include(":sdks:java:extensions:sql")
> > include(":sdks:python")
> >
> > Just to be super clear, this is Gradle default and
> > is equivalent to just leaving it blank.
> >
> >
> > but we discovered the Maven module names that
> > were used during publishing were "core" / "sql"
> > / ... (effectively the directory name) instead
> > of "beam-sdks-java-core".
> >
> >
> > Isn't this managed by the publication
> > plugin?
> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>  "overriding
> > the default identity values is easy: simply specify
> > the groupId, artifactId or version attributes when
> > configuring the 

[Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-04-02 Thread Michael Luckey
Hi,

agree with Kenn, that this issue at least renders the default
implementation difficult to use.

Although in the example given, i.e. having  sdks/java/core and
sdks/py/core, I am unsure, whether it will impose a problem.

As far as I understand until now, the issue triggers on dependency
declaration. These are - in general - expressed with 3 dimensional maven
coordinates GroupID, artifactID and version. Of course - semantic of
version is clear - there are only 2 dimension left to distinguish
artefacts. As we use a single group id (org.apache.beam) there is only one
dimension left.

Now this does not impose a problem on plain library dependencies. Of course
they are uniquely defined. But we are using also lots of project
dependencies. This project dependencies are translated from project path to
those maven coordinates. Unfortunately here the project name - which
happens to be the folder name - is used as artefact id. So if folder names
match, we might get collisions during dependency resolution.

Clearly, it is not possible to deploy artefacts with those same ids to any
maven rep expecting sensible results. So we do either not deploy an
artefact from one of these projects - which would kind of strange as we do
have a project dependency here - or do rewrite the artefact id of (at
least) one of the colliding projects. ( we currently do that implicitly
with the project name we create by flattening our structure)

Now back to the given example, as I do not expect any java project to have
a project dependency on python, there might be a chance, that this will
just work.

But of course, this does not really help, as we reasonably might expect
some /runner/direct/core or sdks/java/io/someio/core which would collide in
the same way.

As a possible workaround here, we could
- either require unique folder names
- or rewrite only colliding project names (as we currently do for all
projects)
- or ... (do not know yet)

I suggest I ll invest some time playing around improving that already
prepared PR to support discussion. So that we have proper grounding to
decide whether a more hierarchical project structure will be worth that
hassle.

Looking at the gradle issue - which is already 2 yrs old and iirc was
reported already at least one year earlier - I do not expect a fix here
soon.

On Tue, Apr 2, 2019 at 7:19 PM Lukasz Cwik  wrote:

> I didn't know that https://github.com/gradle/gradle/issues/847 existed
> but the description of the issues people are having are similar to what was
> discovered during the gradle migration.
>
> On Tue, Apr 2, 2019 at 8:02 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Michael,
>>
>> no problem for the thread, that's the goal of the mailing list ;)
>>
>> And yes, you got my idea about a "meta" module: easy way of building the
>> "whole" Java SDK.
>>
>> The purpose is not to create a uber jar, but more to simplify the build
>> for Java SDK developers.
>>
>> Do you want me to complete your PR with what I did ?
>>
>> Regards
>> JB
>>
>> On 02/04/2019 16:49, Michael Luckey wrote:
>> > Going to fork the BEAM-4046 discussion. And, JB, I apologise for
>> > hijacking your thread.
>> >
>> > As for the original question, I understood a request for a meta project
>> > which will enable easier handling of java projects. E.g. instead of
>> > requiring the user to call
>> >
>> > ./gradlew module1:build module2:build ... moduleN.build
>> >
>> > a meta project with build task defined something about
>> >
>> > build.dependsOn module1:build
>> > build.dependsOn module2:build
>> > ...
>> > build.dependsOn moduleN:build
>> >
>> > And other task as found usable.
>> >
>> > Not a project which in itself creates some uberjar, which I also believe
>> > would rather difficult to implement.
>> >
>> > On Tue, Apr 2, 2019 at 5:13 AM Kenneth Knowles > > > wrote:
>> >
>> > Oh, yikes. It seems
>> > like https://github.com/gradle/gradle/issues/847 indicates that the
>> > feature to use the default names in Gradle is practically
>> > nonfunctional. If that bug is as severe as it looks, I have to
>> > retract my position. Like we could never have sdks/java/core and
>> > sdks/py/core, right?
>> >
>> > Kenn
>> >
>> > On Mon, Apr 1, 2019 at 6:27 PM Michael Luckey > > > wrote:
>> >
>> > FWIW, hacked something as showcase for BEAM-4046 [1]
>> >
>> > This is miserably broken, but a
>> >
>> > ./gradlew projects
>> >
>> > or
>> >
>> > ./gradlew -p sdks/java build
>> >
>> > should work. Anything else is likely to cause issues. If u hit
>> > stack overflow exception, it's likely caused
>> > by https://github.com/gradle/gradle/issues/847
>> >
>> > To continue here, lots of cleanup has to be done. We might also
>> > need to rename folders etc, do better reflect semantic
>> intentions.
>> >
>> > [1] https://github.com/apache/beam/pull/8194
>> >
>> > 

Re: kafka 0.9 support

2019-04-02 Thread Austin Bennett
I withdraw my concern -- checked on info on the cluster I will eventually
access.  It is on 0.8, so I was speaking too soon.  Can't speak to rest of
user base.

On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi  wrote:

> Thanks to David Morávek for pointing out possible improvement to KafkaIO
> for dropping support for 0.9 since it avoids having a second consumer just
> to fetch latest offsets for backlog.
>
> Ideally we should be dropping 0.9 support for next major release, in fact
> better to drop versions before 0.10.1 at the same time. This would further
> reduce reflection based calls for supporting multiple versions. If the
> users still on 0.9 could stay on current stable release of Beam, dropping
> would not affect them. Otherwise, it would be good to hear from them about
> how long we need to keep support for old versions.
>
> I don't think it is good idea to have multiple forks of KafkaIO in the
> same repo. If we do go that route, we should fork the entire kafka
> directory and rename the main class KafkaIO_Unmaintained :).
>
> IMHO, so far, additional complexity for supporting these versions is not
> that bad. Most of it is isolated to ConsumerSpEL.java & ProducerSpEL.java.
> My first preference is dropping support for deprecated versions (and a
> deprecate a few more versions, may be till the version that added
> transactions around 0.11.x I think).
>
> I haven't looked into what's new in Kafka 2.x. Are there any features that
> KafkaIO should take advantage of? I have not noticed our existing code
> breaking. We should certainly certainly support latest releases of Kafka.
>
> Raghu.
>
> On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu  wrote:
>
>>
>> We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand
>> multiple versions in KafkaIO is quite complex now, and it confuses users
>> which is supported / which is not. I would prefer to support Kafka 2.0+
>> only in the latest version. For old versions, there're some options:
>> 1). document Kafka-Beam support versions, like what we do in FlinkRunner;
>> 2). maintain separated KafkaIOs for old versions;
>>
>> 1) would be easy to maintain, and I assume there should be no issue to
>> use Beam-Core 3.0 together with KafkaIO 2.0.
>>
>> Any thoughts?
>>
>> Mingmin
>>
>> On Tue, Apr 2, 2019 at 9:56 AM Reuven Lax  wrote:
>>
>>> KafkaIO is marked as Experimental, and the comment already warns that
>>> 0.9 support might be removed. I think that if users still rely on Kafka 0.9
>>> we should leave a fork (renamed) of the IO in the tree for 0.9, but we can
>>> definitely remove 0.9 support from the main IO if we want, especially if
>>> it's complicated changes to that IO. If we do though, we should fail with a
>>> clear error message telling users to use the Kafka 0.9 IO.
>>>
>>> On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 > How are multiple versions of Kafka supported? Are they all in one
 client, or is there a case for forks like ElasticSearchIO?

 They are supported in one client but we have additional “ConsumerSpEL”
 adapter which unifies interface difference among different Kafka client
 versions (mostly to support old ones 0.9-0.10.0).

 On the other hand, we warn user in Javadoc of KafkaIO (which is
 Unstable, btw) by the following:
 *“KafkaIO relies on kafka-clients for all its interactions with the
 Kafka cluster.**kafka-clients versions 0.10.1 and newer are supported
 at runtime. The older versions 0.9.x **- 0.10.0.0 are also supported,
 but are deprecated and likely be removed in near future.”*

 Despite the fact that, personally, I’d prefer to have only one unified
 client interface but, since people still use Beam with old Kafka instances,
 we, likely, should stick with it till Beam 3.0.

 WDYT?

 On 2 Apr 2019, at 02:27, Austin Bennett 
 wrote:

 FWIW --

 On my (desired, not explicitly job-function) roadmap is to tap into a
 bunch of our corporate Kafka queues to ingest that data to places I can
 use.  Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade
 path isn't trivial, is very critical flows, and they are scared for it to
 break, so it just sits behind firewalls, etc).  But, I wouldn't begin that
 for probably at least another quarter.

 I don't contribute to nor understand the burden of maintaining the
 support for the older version, so can't reasonably lobby for that continued
 pain.

 Anecdotally, this could be a place many enterprises are at (though I
 also wonder whether many of the people that would be 'stuck' on such
 versions would also have Beam on their current radar).


 On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:

> This could be a backward-incompatible change, though that notion has
> many interpretations. What matters is user pain. Technically if we don't
> break 

Re: Quieten javadoc generation

2019-04-02 Thread Mikhail Gryzykhin
+1 to suppress warnings globally. If we care about an issue, it should be
error.

On Tue, Apr 2, 2019 at 5:38 AM Alexey Romanenko 
wrote:

> +1 to suppress such warnings globally. IMO, usually, meaningful Javadoc
> description is quite enough to understand what this method does.
>
> On 1 Apr 2019, at 18:21, Kenneth Knowles  wrote:
>
> Personally, I would like to suppress the warnings globally. I think
> requiring javadoc everywhere is already enough to remind someone to write
> something meaningful. And I think @param rarely adds anything beyond the
> function signature and @return rarely adds anything beyond the description.
>
> Kenn
>
> On Mon, Apr 1, 2019 at 6:53 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> currently our console output gets cluttered by thousands of Javadoc
>> warnings [1]. Most of them are warnings caused by missinlng @return
>> or @param tags  [2].
>>
>> So currently, this signal is completely ignored, and even worse, makes it
>> difficult to parse through the log.
>>
>> As I could not find a previous discussion on the list on how to handle
>> param/return on java docs, I felt the need to ask here first, how we would
>> like to improve this situation.
>>
>> Some options
>> 1. fix those warnings
>> 2. do not insist on those tags being present and disable doclint warnings
>> (probably not doable on tag granularity). This is already done on doc
>> aggregation task [3]
>>
>> Thoughts?
>>
>>
>> [1] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/console
>> [2] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/java/
>> [3]
>> https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle#L77-L78
>>
>>
>


Re: kafka 0.9 support

2019-04-02 Thread Raghu Angadi
Thanks to David Morávek for pointing out possible improvement to KafkaIO
for dropping support for 0.9 since it avoids having a second consumer just
to fetch latest offsets for backlog.

Ideally we should be dropping 0.9 support for next major release, in fact
better to drop versions before 0.10.1 at the same time. This would further
reduce reflection based calls for supporting multiple versions. If the
users still on 0.9 could stay on current stable release of Beam, dropping
would not affect them. Otherwise, it would be good to hear from them about
how long we need to keep support for old versions.

I don't think it is good idea to have multiple forks of KafkaIO in the same
repo. If we do go that route, we should fork the entire kafka directory and
rename the main class KafkaIO_Unmaintained :).

IMHO, so far, additional complexity for supporting these versions is not
that bad. Most of it is isolated to ConsumerSpEL.java & ProducerSpEL.java.
My first preference is dropping support for deprecated versions (and a
deprecate a few more versions, may be till the version that added
transactions around 0.11.x I think).

I haven't looked into what's new in Kafka 2.x. Are there any features that
KafkaIO should take advantage of? I have not noticed our existing code
breaking. We should certainly certainly support latest releases of Kafka.

Raghu.

On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu  wrote:

>
> We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand multiple
> versions in KafkaIO is quite complex now, and it confuses users which is
> supported / which is not. I would prefer to support Kafka 2.0+ only in the
> latest version. For old versions, there're some options:
> 1). document Kafka-Beam support versions, like what we do in FlinkRunner;
> 2). maintain separated KafkaIOs for old versions;
>
> 1) would be easy to maintain, and I assume there should be no issue to use
> Beam-Core 3.0 together with KafkaIO 2.0.
>
> Any thoughts?
>
> Mingmin
>
> On Tue, Apr 2, 2019 at 9:56 AM Reuven Lax  wrote:
>
>> KafkaIO is marked as Experimental, and the comment already warns that 0.9
>> support might be removed. I think that if users still rely on Kafka 0.9 we
>> should leave a fork (renamed) of the IO in the tree for 0.9, but we can
>> definitely remove 0.9 support from the main IO if we want, especially if
>> it's complicated changes to that IO. If we do though, we should fail with a
>> clear error message telling users to use the Kafka 0.9 IO.
>>
>> On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko 
>> wrote:
>>
>>> > How are multiple versions of Kafka supported? Are they all in one
>>> client, or is there a case for forks like ElasticSearchIO?
>>>
>>> They are supported in one client but we have additional “ConsumerSpEL”
>>> adapter which unifies interface difference among different Kafka client
>>> versions (mostly to support old ones 0.9-0.10.0).
>>>
>>> On the other hand, we warn user in Javadoc of KafkaIO (which is
>>> Unstable, btw) by the following:
>>> *“KafkaIO relies on kafka-clients for all its interactions with the
>>> Kafka cluster.**kafka-clients versions 0.10.1 and newer are supported
>>> at runtime. The older versions 0.9.x **- 0.10.0.0 are also supported,
>>> but are deprecated and likely be removed in near future.”*
>>>
>>> Despite the fact that, personally, I’d prefer to have only one unified
>>> client interface but, since people still use Beam with old Kafka instances,
>>> we, likely, should stick with it till Beam 3.0.
>>>
>>> WDYT?
>>>
>>> On 2 Apr 2019, at 02:27, Austin Bennett 
>>> wrote:
>>>
>>> FWIW --
>>>
>>> On my (desired, not explicitly job-function) roadmap is to tap into a
>>> bunch of our corporate Kafka queues to ingest that data to places I can
>>> use.  Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade
>>> path isn't trivial, is very critical flows, and they are scared for it to
>>> break, so it just sits behind firewalls, etc).  But, I wouldn't begin that
>>> for probably at least another quarter.
>>>
>>> I don't contribute to nor understand the burden of maintaining the
>>> support for the older version, so can't reasonably lobby for that continued
>>> pain.
>>>
>>> Anecdotally, this could be a place many enterprises are at (though I
>>> also wonder whether many of the people that would be 'stuck' on such
>>> versions would also have Beam on their current radar).
>>>
>>>
>>> On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:
>>>
 This could be a backward-incompatible change, though that notion has
 many interpretations. What matters is user pain. Technically if we don't
 break the core SDK, users should be able to use Java SDK >=2.11.0 with
 KafkaIO 2.11.0 forever.

 How are multiple versions of Kafka supported? Are they all in one
 client, or is there a case for forks like ElasticSearchIO?

 Kenn

 On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
 wrote:

> +1 to remove 0.9 support.
>

Re: Debugging :beam-sdks-java-io-hadoop-input-format:test

2019-04-02 Thread Mikhail Gryzykhin
Hi everyone,

I created BEAM-6974 . This
test and beam-sdks-java-io-cassandra tests fail often in our Pre-Commit
jobs. Can someone look into this?

Thank you,
Mikhail.


On Thu, Mar 28, 2019 at 12:31 PM Mikhail Gryzykhin 
wrote:

> I've seen it couple of times already and just got another repro:
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/5011/consoleFull
>
> On Thu, Mar 28, 2019 at 8:55 AM Alexey Romanenko 
> wrote:
>
>> Hi Mikhail,
>>
>> We had a flaky “HIFIOWithEmbeddedCassandraTest” a while ago and it was
>> caused by issue with launching of embedded Cassandra cluster. Then it was
>> fixed by Etienne Chauchot's PR [1]
>> Though, I don’t see any similar error messages in your Jenkins job log,
>> so, I’m not sure it’s the same issue.
>>
>> Have you seen this fail only once or several times already?
>>
>> [1] https://github.com/apache/beam/pull/8000
>>
>> On 27 Mar 2019, at 22:24, Mikhail Gryzykhin  wrote:
>>
>> Hi everyone,
>>
>> I have a pre-commit job that fails on
>> *:beam-sdks-java-io-hadoop-input-format:test*
>> .
>> Relevant PR. 
>>
>> Target doesn't have any explicit log associated with it. Running same
>> target in local doesn't give me much help. It seem to fail somewhere in
>> native runtime.
>>
>> Can someone help with tackling this issue?
>>
>> Regards,
>> Mikhail.
>>
>>
>>
>>


Re: kafka 0.9 support

2019-04-02 Thread Mingmin Xu
We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand multiple
versions in KafkaIO is quite complex now, and it confuses users which is
supported / which is not. I would prefer to support Kafka 2.0+ only in the
latest version. For old versions, there're some options:
1). document Kafka-Beam support versions, like what we do in FlinkRunner;
2). maintain separated KafkaIOs for old versions;

1) would be easy to maintain, and I assume there should be no issue to use
Beam-Core 3.0 together with KafkaIO 2.0.

Any thoughts?

Mingmin

On Tue, Apr 2, 2019 at 9:56 AM Reuven Lax  wrote:

> KafkaIO is marked as Experimental, and the comment already warns that 0.9
> support might be removed. I think that if users still rely on Kafka 0.9 we
> should leave a fork (renamed) of the IO in the tree for 0.9, but we can
> definitely remove 0.9 support from the main IO if we want, especially if
> it's complicated changes to that IO. If we do though, we should fail with a
> clear error message telling users to use the Kafka 0.9 IO.
>
> On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko 
> wrote:
>
>> > How are multiple versions of Kafka supported? Are they all in one
>> client, or is there a case for forks like ElasticSearchIO?
>>
>> They are supported in one client but we have additional “ConsumerSpEL”
>> adapter which unifies interface difference among different Kafka client
>> versions (mostly to support old ones 0.9-0.10.0).
>>
>> On the other hand, we warn user in Javadoc of KafkaIO (which is Unstable,
>> btw) by the following:
>> *“KafkaIO relies on kafka-clients for all its interactions with the Kafka
>> cluster.**kafka-clients versions 0.10.1 and newer are supported at
>> runtime. The older versions 0.9.x **- 0.10.0.0 are also supported, but
>> are deprecated and likely be removed in near future.”*
>>
>> Despite the fact that, personally, I’d prefer to have only one unified
>> client interface but, since people still use Beam with old Kafka instances,
>> we, likely, should stick with it till Beam 3.0.
>>
>> WDYT?
>>
>> On 2 Apr 2019, at 02:27, Austin Bennett 
>> wrote:
>>
>> FWIW --
>>
>> On my (desired, not explicitly job-function) roadmap is to tap into a
>> bunch of our corporate Kafka queues to ingest that data to places I can
>> use.  Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade
>> path isn't trivial, is very critical flows, and they are scared for it to
>> break, so it just sits behind firewalls, etc).  But, I wouldn't begin that
>> for probably at least another quarter.
>>
>> I don't contribute to nor understand the burden of maintaining the
>> support for the older version, so can't reasonably lobby for that continued
>> pain.
>>
>> Anecdotally, this could be a place many enterprises are at (though I also
>> wonder whether many of the people that would be 'stuck' on such versions
>> would also have Beam on their current radar).
>>
>>
>> On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:
>>
>>> This could be a backward-incompatible change, though that notion has
>>> many interpretations. What matters is user pain. Technically if we don't
>>> break the core SDK, users should be able to use Java SDK >=2.11.0 with
>>> KafkaIO 2.11.0 forever.
>>>
>>> How are multiple versions of Kafka supported? Are they all in one
>>> client, or is there a case for forks like ElasticSearchIO?
>>>
>>> Kenn
>>>
>>> On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
 +1 to remove 0.9 support.

 I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)

 Regards
 JB

 On 01/04/2019 19:36, David Morávek wrote:
 > Hello,
 >
 > is there still a reason to keep Kafka 0.9 support? This unfortunately
 > adds lot of complexity to KafkaIO implementation.
 >
 > Kafka 0.9 was released on Nov 2015.
 >
 > My first shot on removing Kafka 0.9 support would remove second
 > consumer, which is used for fetching offsets.
 >
 > WDYT? Is this support worth keeping?
 >
 > https://github.com/apache/beam/pull/8186
 >
 > D.

 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com

>>>
>>

-- 

Mingmin


Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2019-04-02 Thread Lukasz Cwik
I was able to update the failing Watch transform in
https://github.com/apache/beam/pull/8146 and this has now been merged.

On Mon, Mar 18, 2019 at 10:32 AM Lukasz Cwik  wrote:

> Thanks Kenn, based upon the error message there was a small amount of code
> that I missed when updating the code. I'll attempt to fix this in the next
> few days.
>
> On Mon, Jan 14, 2019 at 7:26 PM Kenneth Knowles  wrote:
>
>> I wanted to use this thread to ping that the change to the user-facing
>> API in order to wrap RestrictionTracker broke the Watch transform, which
>> has been sickbayed for a long time. It would be helpful for experts to
>> weigh in on https://issues.apache.org/jira/browse/BEAM-6352 about how
>> the functionality used here should be implemented.
>>
>> Kenn
>>
>> On Wed, Dec 5, 2018 at 4:45 PM Lukasz Cwik  wrote:
>>
>>> Based upon the current Java SDK API, I was able to implement Runner
>>> initiated checkpointing that the Java SDK honors within PR
>>> https://github.com/apache/beam/pull/7200.
>>>
>>> This is an exciting first step to a splitting implementation, feel free
>>> to take a look and comment. I have added two basic tests, execute SDF
>>> without splitting and execute SDF with a runner initiated checkpoint.
>>>
>>> On Fri, Nov 30, 2018 at 4:52 PM Robert Bradshaw 
>>> wrote:
>>>
 On Fri, Nov 30, 2018 at 10:14 PM Lukasz Cwik  wrote:
 >
 > On Fri, Nov 30, 2018 at 1:02 PM Robert Bradshaw 
 wrote:
 >>
 >> On Fri, Nov 30, 2018 at 6:38 PM Lukasz Cwik 
 wrote:
 >> >
 >> > Sorry, for some reason I thought I had answered these.
 >>
 >> No problem, thanks for you patience :).
 >>
 >> > On Fri, Nov 30, 2018 at 2:20 AM Robert Bradshaw <
 rober...@google.com> wrote:
 >> >>
 >> >> I still have outstanding questions (above) about
 >> >>
 >> >> 1) Why we need arbitrary precision for backlog, instead of just
 using
 >> >> a (much simpler) double.
 >> >
 >> >
 >> > Double lacks the precision for reporting backlogs for byte key
 ranges (HBase, Bigtable, ...). Scanning a key range such as ["a", "b") and
 with a large number of keys with a really long common prefix such as
 "aab" and "aac", ... leads
 to the backlog not changing even though we are making progress through the
 key space. This also prevents splitting within such an area since the
 double can't provide that necessary precision (without multiple rounds of
 splitting which adds complexity).
 >>
 >> We'll have to support multiple rounds of splitting regardless. I can
 >> see how this gives more information up front though.
 >
 > I agree that we will need to support multiple rounds of splitting
 from the SDK side but this adds complexity from the runner side since it
 can only increase the accuracy for a split by performing multiple rounds of
 splitting at once.
 >
 >> (As an aside, I've been thinking about some ways of solving the dark
 >> matter problem, and it might depend on knowing the actual key, using
 >> the fact that character boundaries are likely cut-off points for
 >> changes in density, which would get obscured by alternative
 >> representations.)
 >
 > Every time I think about this issue, I can never get it to apply
 meaningfully for unbounded sources such as a message queue like pubsub.

 Yeah, neither can I.

 > Also, having an infinitely precise backlog such as the decimal format
 would still provide density information as the rate of change through the
 backlog for a bounded source would change once a "cluster" was hit.

 This is getting to somewhat of a tangential topic, but the key insight
 is that although it's easy to find the start of a cluster, to split
 ideally one would want to know where the end of the cluster is. For
 keyspaces, this is likely to be at binary fractions, and in particular
 looking at the longevity of common prefixes of length n one could make
 heuristic guesses as to where this density dropoff may be. (This also
 requires splitting at a key, not splitting relative to a current
 position, which has its issues...)

 >> >> 2) Whether its's worth passing backlog back to split requests,
 rather
 >> >> than (again) a double representing "portion of current remaining"
 >> >> which may change over time. (The most common split request is into
 >> >> even portions, and specifically half, which can't accurately be
 >> >> requested from a stale backlog.)
 >> >
 >> > I see two scenarios here:
 >> > * the fraction is exposed to the SDF author and then the SDF
 author needs to map from their restriciton space to backlog and also map
 fractions onto their restriction space meaning that they are required to
 write mappings between three different models.
 >> > * the fraction is not 

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-02 Thread Lukasz Cwik
I didn't know that https://github.com/gradle/gradle/issues/847 existed but
the description of the issues people are having are similar to what was
discovered during the gradle migration.

On Tue, Apr 2, 2019 at 8:02 AM Jean-Baptiste Onofré  wrote:

> Hi Michael,
>
> no problem for the thread, that's the goal of the mailing list ;)
>
> And yes, you got my idea about a "meta" module: easy way of building the
> "whole" Java SDK.
>
> The purpose is not to create a uber jar, but more to simplify the build
> for Java SDK developers.
>
> Do you want me to complete your PR with what I did ?
>
> Regards
> JB
>
> On 02/04/2019 16:49, Michael Luckey wrote:
> > Going to fork the BEAM-4046 discussion. And, JB, I apologise for
> > hijacking your thread.
> >
> > As for the original question, I understood a request for a meta project
> > which will enable easier handling of java projects. E.g. instead of
> > requiring the user to call
> >
> > ./gradlew module1:build module2:build ... moduleN.build
> >
> > a meta project with build task defined something about
> >
> > build.dependsOn module1:build
> > build.dependsOn module2:build
> > ...
> > build.dependsOn moduleN:build
> >
> > And other task as found usable.
> >
> > Not a project which in itself creates some uberjar, which I also believe
> > would rather difficult to implement.
> >
> > On Tue, Apr 2, 2019 at 5:13 AM Kenneth Knowles  > > wrote:
> >
> > Oh, yikes. It seems
> > like https://github.com/gradle/gradle/issues/847 indicates that the
> > feature to use the default names in Gradle is practically
> > nonfunctional. If that bug is as severe as it looks, I have to
> > retract my position. Like we could never have sdks/java/core and
> > sdks/py/core, right?
> >
> > Kenn
> >
> > On Mon, Apr 1, 2019 at 6:27 PM Michael Luckey  > > wrote:
> >
> > FWIW, hacked something as showcase for BEAM-4046 [1]
> >
> > This is miserably broken, but a
> >
> > ./gradlew projects
> >
> > or
> >
> > ./gradlew -p sdks/java build
> >
> > should work. Anything else is likely to cause issues. If u hit
> > stack overflow exception, it's likely caused
> > by https://github.com/gradle/gradle/issues/847
> >
> > To continue here, lots of cleanup has to be done. We might also
> > need to rename folders etc, do better reflect semantic
> intentions.
> >
> > [1] https://github.com/apache/beam/pull/8194
> >
> > On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles  > > wrote:
> >
> >
> >
> > On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  > > wrote:
> >
> >
> >
> > On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles
> > mailto:k...@apache.org>> wrote:
> >
> >
> > As to building an aggregated "Java" project, I think
> > the blocker will be supporting conflicting deps. For
> > IOs like ElasticSearch and runners like Flink the
> > conflict is essential and deliberate, to support
> > multiple versions of other services. And that is not
> > even talking about transitive dep conflicts. I think
> > Python and Go don't have this issue simply because
> > they haven't tackled those problems.
> >
> > Are you talking about just a shortcut for building
> > (super easy to just add since we are using Gradle)
> > or a new artifact that you want to distribute?
> >
> > On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik
> > mailto:lc...@google.com>> wrote:
> >
> > During the gradle migration, we used to have
> > something like:
> >
> > include(":sdks:java:core")
> > include(":sdks:java:extensions:sql")
> > include(":sdks:python")
> >
> > Just to be super clear, this is Gradle default and
> > is equivalent to just leaving it blank.
> >
> >
> > but we discovered the Maven module names that
> > were used during publishing were "core" / "sql"
> > / ... (effectively the directory name) instead
> > of "beam-sdks-java-core".
> >
> >
> > Isn't this managed by the publication
> > plugin?
> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>  "overriding
> > the default identity values is easy: simply specify
> > the groupId, artifactId or version attributes when
> > configuring the 

Re: kafka 0.9 support

2019-04-02 Thread Reuven Lax
KafkaIO is marked as Experimental, and the comment already warns that 0.9
support might be removed. I think that if users still rely on Kafka 0.9 we
should leave a fork (renamed) of the IO in the tree for 0.9, but we can
definitely remove 0.9 support from the main IO if we want, especially if
it's complicated changes to that IO. If we do though, we should fail with a
clear error message telling users to use the Kafka 0.9 IO.

On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko 
wrote:

> > How are multiple versions of Kafka supported? Are they all in one
> client, or is there a case for forks like ElasticSearchIO?
>
> They are supported in one client but we have additional “ConsumerSpEL”
> adapter which unifies interface difference among different Kafka client
> versions (mostly to support old ones 0.9-0.10.0).
>
> On the other hand, we warn user in Javadoc of KafkaIO (which is Unstable,
> btw) by the following:
> *“KafkaIO relies on kafka-clients for all its interactions with the Kafka
> cluster.**kafka-clients versions 0.10.1 and newer are supported at
> runtime. The older versions 0.9.x **- 0.10.0.0 are also supported, but
> are deprecated and likely be removed in near future.”*
>
> Despite the fact that, personally, I’d prefer to have only one unified
> client interface but, since people still use Beam with old Kafka instances,
> we, likely, should stick with it till Beam 3.0.
>
> WDYT?
>
> On 2 Apr 2019, at 02:27, Austin Bennett 
> wrote:
>
> FWIW --
>
> On my (desired, not explicitly job-function) roadmap is to tap into a
> bunch of our corporate Kafka queues to ingest that data to places I can
> use.  Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade
> path isn't trivial, is very critical flows, and they are scared for it to
> break, so it just sits behind firewalls, etc).  But, I wouldn't begin that
> for probably at least another quarter.
>
> I don't contribute to nor understand the burden of maintaining the support
> for the older version, so can't reasonably lobby for that continued pain.
>
> Anecdotally, this could be a place many enterprises are at (though I also
> wonder whether many of the people that would be 'stuck' on such versions
> would also have Beam on their current radar).
>
>
> On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:
>
>> This could be a backward-incompatible change, though that notion has many
>> interpretations. What matters is user pain. Technically if we don't break
>> the core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO
>> 2.11.0 forever.
>>
>> How are multiple versions of Kafka supported? Are they all in one client,
>> or is there a case for forks like ElasticSearchIO?
>>
>> Kenn
>>
>> On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 to remove 0.9 support.
>>>
>>> I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 01/04/2019 19:36, David Morávek wrote:
>>> > Hello,
>>> >
>>> > is there still a reason to keep Kafka 0.9 support? This unfortunately
>>> > adds lot of complexity to KafkaIO implementation.
>>> >
>>> > Kafka 0.9 was released on Nov 2015.
>>> >
>>> > My first shot on removing Kafka 0.9 support would remove second
>>> > consumer, which is used for fetching offsets.
>>> >
>>> > WDYT? Is this support worth keeping?
>>> >
>>> > https://github.com/apache/beam/pull/8186
>>> >
>>> > D.
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>


Re: kafka 0.9 support

2019-04-02 Thread Alexey Romanenko
> How are multiple versions of Kafka supported? Are they all in one client, or 
> is there a case for forks like ElasticSearchIO?

They are supported in one client but we have additional “ConsumerSpEL” adapter 
which unifies interface difference among different Kafka client versions 
(mostly to support old ones 0.9-0.10.0).

On the other hand, we warn user in Javadoc of KafkaIO (which is Unstable, btw) 
by the following:
“KafkaIO relies on kafka-clients for all its interactions with the Kafka 
cluster.kafka-clients versions 0.10.1 and newer are supported at runtime. The 
older versions 0.9.x - 0.10.0.0 are also supported, but are deprecated and 
likely be removed in near future.”

Despite the fact that, personally, I’d prefer to have only one unified client 
interface but, since people still use Beam with old Kafka instances, we, 
likely, should stick with it till Beam 3.0.

WDYT?

> On 2 Apr 2019, at 02:27, Austin Bennett  wrote:
> 
> FWIW -- 
> 
> On my (desired, not explicitly job-function) roadmap is to tap into a bunch 
> of our corporate Kafka queues to ingest that data to places I can use.  Those 
> are 'stuck' 0.9, with no upgrade in sight (am told the upgrade path isn't 
> trivial, is very critical flows, and they are scared for it to break, so it 
> just sits behind firewalls, etc).  But, I wouldn't begin that for probably at 
> least another quarter.  
> 
> I don't contribute to nor understand the burden of maintaining the support 
> for the older version, so can't reasonably lobby for that continued pain.  
> 
> Anecdotally, this could be a place many enterprises are at (though I also 
> wonder whether many of the people that would be 'stuck' on such versions 
> would also have Beam on their current radar).  
> 
> 
> On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  > wrote:
> This could be a backward-incompatible change, though that notion has many 
> interpretations. What matters is user pain. Technically if we don't break the 
> core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO 2.11.0 
> forever.
> 
> How are multiple versions of Kafka supported? Are they all in one client, or 
> is there a case for forks like ElasticSearchIO?
> 
> Kenn
> 
> On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré  > wrote:
> +1 to remove 0.9 support.
> 
> I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)
> 
> Regards
> JB
> 
> On 01/04/2019 19:36, David Morávek wrote:
> > Hello,
> > 
> > is there still a reason to keep Kafka 0.9 support? This unfortunately
> > adds lot of complexity to KafkaIO implementation.
> > 
> > Kafka 0.9 was released on Nov 2015.
> > 
> > My first shot on removing Kafka 0.9 support would remove second
> > consumer, which is used for fetching offsets.
> > 
> > WDYT? Is this support worth keeping?
> > 
> > https://github.com/apache/beam/pull/8186 
> > 
> > 
> > D.
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net 
> Talend - http://www.talend.com 



Re: Beam contribution

2019-04-02 Thread Lukasz Cwik
+Ahmed

I have added you as a contributor.

It seems as though Ahmed had just picked up BEAM-3489 yesterday. Reach out
to Ahmed if you would like to help them out with the task.

Was TimerReceiverTest failing reliably when performing a parallel build or
is it flaky?

I have asked Chamikara to take a look for PR 8180.


On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai  wrote:

> Hi All!
>
> I am Csabi, I would be happy to contribute to Beam.
> Could you grant me contributor role and assign issue BEAM-3489
>   to me? My user name is
> "csabakassai".
>
> After I checked out the code and tried to do a gradle check I found these
> issues:
>
>1. *jUnit tests fails:* the TimerReceiverTest fails in the
>":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and the
>":beam-runners-google-cloud-dataflow-java-legacy-worker:test" tasks. When I
>execute tests independently everything is fine, so I disabled the parallel
>build and this solves the problem. I have not investigated further, do you
>have any more insights on this issue? I have attached the test reports.
>2. *python test fail*: there is a python test which fails if the
>current offset of your timezone differs from the offset in 1970. In my case
>the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I created a ticket
>for this issue where I I describe the problem in details:
>https://jira.apache.org/jira/browse/BEAM-6947. Could you assign the
>ticket to me? Also I created a PR with a possible fix:
>https://github.com/apache/beam/pull/8180. Could you suggest me a
>reviewer?
>
>
> Thank you,
> Csabi
>
>
>
>


Re: kafka 0.9 support

2019-04-02 Thread Reuven Lax
If users are still stuck on Kafka 0.9, that's an argument that we should
continue supporting it. however, if it's making it hard for us to support
KafkaIO, maybe we should consider splitting it? We could fork off
KafkaIO0.9, and remove 0.9 support from the regular KafkaIO.

On Mon, Apr 1, 2019 at 5:27 PM Austin Bennett 
wrote:

> FWIW --
>
> On my (desired, not explicitly job-function) roadmap is to tap into a
> bunch of our corporate Kafka queues to ingest that data to places I can
> use.  Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade
> path isn't trivial, is very critical flows, and they are scared for it to
> break, so it just sits behind firewalls, etc).  But, I wouldn't begin that
> for probably at least another quarter.
>
> I don't contribute to nor understand the burden of maintaining the support
> for the older version, so can't reasonably lobby for that continued pain.
>
> Anecdotally, this could be a place many enterprises are at (though I also
> wonder whether many of the people that would be 'stuck' on such versions
> would also have Beam on their current radar).
>
>
> On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:
>
>> This could be a backward-incompatible change, though that notion has many
>> interpretations. What matters is user pain. Technically if we don't break
>> the core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO
>> 2.11.0 forever.
>>
>> How are multiple versions of Kafka supported? Are they all in one client,
>> or is there a case for forks like ElasticSearchIO?
>>
>> Kenn
>>
>> On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 to remove 0.9 support.
>>>
>>> I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 01/04/2019 19:36, David Morávek wrote:
>>> > Hello,
>>> >
>>> > is there still a reason to keep Kafka 0.9 support? This unfortunately
>>> > adds lot of complexity to KafkaIO implementation.
>>> >
>>> > Kafka 0.9 was released on Nov 2015.
>>> >
>>> > My first shot on removing Kafka 0.9 support would remove second
>>> > consumer, which is used for fetching offsets.
>>> >
>>> > WDYT? Is this support worth keeping?
>>> >
>>> > https://github.com/apache/beam/pull/8186
>>> >
>>> > D.
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>


Beam contribution

2019-04-02 Thread Csaba Kassai
Hi All!

I am Csabi, I would be happy to contribute to Beam.
Could you grant me contributor role and assign issue BEAM-3489
  to me? My user name is "
csabakassai".

After I checked out the code and tried to do a gradle check I found these
issues:

   1. *jUnit tests fails:* the TimerReceiverTest fails in the
   ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and the
   ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" tasks. When I
   execute tests independently everything is fine, so I disabled the parallel
   build and this solves the problem. I have not investigated further, do you
   have any more insights on this issue? I have attached the test reports.
   2. *python test fail*: there is a python test which fails if the current
   offset of your timezone differs from the offset in 1970. In my case the
   Singapore is now GMT+8 and it was GMT+7:30 in 1970. I created a ticket for
   this issue where I I describe the problem in details:
   https://jira.apache.org/jira/browse/BEAM-6947. Could you assign the
   ticket to me? Also I created a PR with a possible fix:
   https://github.com/apache/beam/pull/8180. Could you suggest me a
   reviewer?


Thank you,
Csabi
Title: Test results - Class org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest





Class org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest

all > 
org.apache.beam.runners.dataflow.worker.fn.control > TimerReceiverTest









2
tests




2
failures




0
ignored




1.174s
duration








0%
successful







Failed tests
Tests
Standard error





testMultiTimerScheduling

java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.tearDown(TimerReceiverTest.java:147)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
	at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at 

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-02 Thread Jean-Baptiste Onofré
Hi Michael,

no problem for the thread, that's the goal of the mailing list ;)

And yes, you got my idea about a "meta" module: easy way of building the
"whole" Java SDK.

The purpose is not to create a uber jar, but more to simplify the build
for Java SDK developers.

Do you want me to complete your PR with what I did ?

Regards
JB

On 02/04/2019 16:49, Michael Luckey wrote:
> Going to fork the BEAM-4046 discussion. And, JB, I apologise for
> hijacking your thread.
> 
> As for the original question, I understood a request for a meta project
> which will enable easier handling of java projects. E.g. instead of
> requiring the user to call
> 
>     ./gradlew module1:build module2:build ... moduleN.build
> 
> a meta project with build task defined something about
> 
> build.dependsOn module1:build
> build.dependsOn module2:build
> ...
> build.dependsOn moduleN:build
> 
> And other task as found usable.
> 
> Not a project which in itself creates some uberjar, which I also believe
> would rather difficult to implement.
> 
> On Tue, Apr 2, 2019 at 5:13 AM Kenneth Knowles  > wrote:
> 
> Oh, yikes. It seems
> like https://github.com/gradle/gradle/issues/847 indicates that the
> feature to use the default names in Gradle is practically
> nonfunctional. If that bug is as severe as it looks, I have to
> retract my position. Like we could never have sdks/java/core and
> sdks/py/core, right?
> 
> Kenn
> 
> On Mon, Apr 1, 2019 at 6:27 PM Michael Luckey  > wrote:
> 
> FWIW, hacked something as showcase for BEAM-4046 [1]
> 
> This is miserably broken, but a
> 
> ./gradlew projects
> 
> or
> 
> ./gradlew -p sdks/java build
> 
> should work. Anything else is likely to cause issues. If u hit
> stack overflow exception, it's likely caused
> by https://github.com/gradle/gradle/issues/847 
> 
> To continue here, lots of cleanup has to be done. We might also
> need to rename folders etc, do better reflect semantic intentions.
> 
> [1] https://github.com/apache/beam/pull/8194
> 
> On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles  > wrote:
> 
> 
> 
> On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  > wrote:
> 
> 
> 
> On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles
> mailto:k...@apache.org>> wrote:
> 
> 
> As to building an aggregated "Java" project, I think
> the blocker will be supporting conflicting deps. For
> IOs like ElasticSearch and runners like Flink the
> conflict is essential and deliberate, to support
> multiple versions of other services. And that is not
> even talking about transitive dep conflicts. I think
> Python and Go don't have this issue simply because
> they haven't tackled those problems.
> 
> Are you talking about just a shortcut for building
> (super easy to just add since we are using Gradle)
> or a new artifact that you want to distribute?
> 
> On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik
> mailto:lc...@google.com>> wrote:
> 
> During the gradle migration, we used to have
> something like:
> 
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
> 
> Just to be super clear, this is Gradle default and
> is equivalent to just leaving it blank.
>  
> 
> but we discovered the Maven module names that
> were used during publishing were "core" / "sql"
> / ... (effectively the directory name) instead
> of "beam-sdks-java-core".
> 
> 
> Isn't this managed by the publication
> plugin? 
> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>  "overriding
> the default identity values is easy: simply specify
> the groupId, artifactId or version attributes when
> configuring the MavenPublication."
> 
> 
> During the gradle migration this wasn't that easy. The
> new maven publish plugin improved a lot since then.
>  
> 
> Using the default at the time also broke the
> artifact names for intra project dependencies
> that we generate[1]. Finally, we also ran into
> an issue 

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-02 Thread Michael Luckey
Going to fork the BEAM-4046 discussion. And, JB, I apologise for hijacking
your thread.

As for the original question, I understood a request for a meta project
which will enable easier handling of java projects. E.g. instead of
requiring the user to call

./gradlew module1:build module2:build ... moduleN.build

a meta project with build task defined something about

build.dependsOn module1:build
build.dependsOn module2:build
...
build.dependsOn moduleN:build

And other task as found usable.

Not a project which in itself creates some uberjar, which I also believe
would rather difficult to implement.

On Tue, Apr 2, 2019 at 5:13 AM Kenneth Knowles  wrote:

> Oh, yikes. It seems like https://github.com/gradle/gradle/issues/847 indicates
> that the feature to use the default names in Gradle is practically
> nonfunctional. If that bug is as severe as it looks, I have to retract my
> position. Like we could never have sdks/java/core and sdks/py/core, right?
>
> Kenn
>
> On Mon, Apr 1, 2019 at 6:27 PM Michael Luckey  wrote:
>
>> FWIW, hacked something as showcase for BEAM-4046 [1]
>>
>> This is miserably broken, but a
>>
>> ./gradlew projects
>>
>> or
>>
>> ./gradlew -p sdks/java build
>>
>> should work. Anything else is likely to cause issues. If u hit stack
>> overflow exception, it's likely caused by
>> https://github.com/gradle/gradle/issues/847
>>
>> To continue here, lots of cleanup has to be done. We might also need to
>> rename folders etc, do better reflect semantic intentions.
>>
>> [1] https://github.com/apache/beam/pull/8194
>>
>> On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles  wrote:
>>
>>>
>>>
>>> On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik  wrote:
>>>


 On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles  wrote:

>
> As to building an aggregated "Java" project, I think the blocker will
> be supporting conflicting deps. For IOs like ElasticSearch and runners 
> like
> Flink the conflict is essential and deliberate, to support multiple
> versions of other services. And that is not even talking about transitive
> dep conflicts. I think Python and Go don't have this issue simply because
> they haven't tackled those problems.
>
> Are you talking about just a shortcut for building (super easy to just
> add since we are using Gradle) or a new artifact that you want to
> distribute?
>
> On Mon, Apr 1, 2019 at 10:01 AM Lukasz Cwik  wrote:
>
>> During the gradle migration, we used to have something like:
>>
>> include(":sdks:java:core")
>> include(":sdks:java:extensions:sql")
>> include(":sdks:python")
>>
>> Just to be super clear, this is Gradle default and is equivalent to
> just leaving it blank.
>
>
>> but we discovered the Maven module names that were used during
>> publishing were "core" / "sql" / ... (effectively the directory name)
>> instead of "beam-sdks-java-core".
>>
>
> Isn't this managed by the publication plugin?
> https://docs.gradle.org/current/userguide/publishing_maven.html#sec:identity_values_in_the_generated_pom
>  "overriding
> the default identity values is easy: simply specify the groupId, 
> artifactId
> or version attributes when configuring the MavenPublication."
>

 During the gradle migration this wasn't that easy. The new maven
 publish plugin improved a lot since then.


> Using the default at the time also broke the artifact names for intra
>> project dependencies that we generate[1]. Finally, we also ran into an
>> issue because we had more then one Gradle project with the same directory
>> name even though they were under a different parent folder (I think it 
>> was
>> "core") and that was leading to some strange build time behavior.
>>
>
> Weird. But I think the Jira should still stand as a move towards
> simplifying our build and making it more discoverable for new 
> contributors.
>

 Agree on the JIRA makes sense, just calling out that there were other
 issues that this naming had caused in the past which should be checked
 before we call this done.

>>>
>>> Totally agree. It will be quite a large task with a lot of boilerplate
>>> that might not be separable from technical blockers that come up as you go
>>> through the boilerplate.
>>>
>>> Kenn
>>>
>>>
>>>
>>> Kenn
>
>
>> We didn't migrate to a flat project structure where each project is a
>> folder underneath the root project because of the existing Maven build
>> rules that were being maintained in parallel and I'm not sure if people
>> would want to have a flat project structure either.
>>
>> 1:
>> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055
>>
>> On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey 
>> wrote:

Re: Deprecating Avro for fastavro on Python 3

2019-04-02 Thread Robert Bradshaw
I agree with Ahmet.

Fastavro seems to be well maintained and has good, tested
compatibility. Unless we expect significant performance improvements
in the standard Avro Python package (a significant undertaking, likely
not one we have the bandwidth to take on, and my impression is that
it's historically not a been priority) it's hard to justify using it
instead. Python 3 issues are just the trigger to consider finally
moving over, as I think that was the lonig-term intent back when
fastavro was added as an option. (Possibly if there are features
missing from fastavro, that could be a reason as well, at least to
keep the option around even if it's not the default.)

That being said, we should definitely not change the default and
remove the old version in the same release.

- Robert

On Tue, Apr 2, 2019 at 2:12 PM Robbe Sneyders  wrote:
>
> Hi all,
>
> Thank you for the feedback. Looking at the responses, it seems like there is 
> a consensus to move forward with fastavro as the default implementation on 
> Python 3.
>
> There are 2 questions left however:
> - Should fastavro also become the default implementation on Python 2?
> This is a trade-off between having a consistent API across Python versions, 
> or keeping the current behavior on Python 2.
>
> - Should we keep the avro-python3 dependency?
> With the proposed solution, we could remove the avro-python3 dependency, but 
> it might have to be re-added if we want to support Avro again on Python 3 in 
> a future version.
>
> Kind regards,
> Robbe
>
>
>
>
> Robbe Sneyders
>
> ML6 Gent
>
> M: +32 474 71 31 08
>
>
> On Thu, 28 Mar 2019 at 18:28, Ahmet Altay  wrote:
>>
>> Hi Ismaël,
>>
>> It is great to hear that Avro is planning to make a release soon.
>>
>> To answer your concerns, fastavro has a set of tests using regular avro 
>> files[1] and it also has a large set of users (with 675470 package 
>> downloads). This is in addition to it being a py2 & py3 compatible package 
>> and offering ~7x performance improvements [2]. Another data point, we were 
>> testing fastavro for a while behind an experimental flag and have not seen 
>> issues related compatibility.
>>
>> pyavro-rs sounds promising however I could not find a released version of it 
>> on pypi. The source code does not look like being maintained either with 
>> last commit on Jul 2, 2018. (for comparison last change on fastavro was on 
>> Mar 19, 2019).
>>
>> I think given the state of things, it makes sense to switch to fastavro as 
>> the default implementation to unblock python 3 changes. When avro offers a 
>> similar level of performance we could switch back without any visible user 
>> impact.
>>
>> Ahmet
>>
>> [1] https://github.com/fastavro/fastavro/tree/master/tests
>> [2] https://pypi.org/project/fastavro/
>>
>> On Thu, Mar 28, 2019 at 7:53 AM Ismaël Mejía  wrote:
>>>
>>> Hello,
>>>
>>> The problem of switching implementations is the risk of losing
>>> interoperability, and this is more important than performance. Does
>>> fastavro have tests that guarantee that it is fully compatible with
>>> Avro’s Java version? (given that it is the de-facto implementation
>>> used everywhere).
>>>
>>> If performance is a more important criteria maybe it is worth to check
>>> at pyavro-rs [1], you can take a look at its performance in the great
>>> talk of last year [2].
>>>
>>> I have been involved actively in the Avro community in the last months
>>> and I am now a committer there. Also Dan Kulp who has done multiple
>>> contributions in Beam is now a PMC member too. We are at this point
>>> working hard to get the next release of Avro out, actually the branch
>>> cut of Avro 1.9.0 is happening this week, and we plan to improve the
>>> release cadence. Please understand that the issue with Avro is that it
>>> is a really specific and ‘old‘ project (~10 years) so part of the
>>> active moved to other areas because it is stable, but we are still
>>> there working on it and we are eager to improve it for everyone’s
>>> needs (and of course Beam needs).
>>>
>>> I know that Python 3’s Avro implementation is still lacking and could
>>> be improved (views expressed here are clearly valid), but maybe this
>>> is a chance to contribute there too. Remember Apache projects are a
>>> family and we have a history of cross colaboration with other
>>> communities e.g. Flink, Calcite so why not give it a chance to Avro
>>> too.
>>>
>>> Regards,
>>> Ismaël
>>>
>>> [1] https://github.com/flavray/pyavro-rs
>>> [2] 
>>> https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf
>>>
>>> On Wed, Mar 27, 2019 at 11:42 PM Chamikara Jayalath
>>>  wrote:
>>> >
>>> > +1 for making use_fastavro the default for Python3. I don't see any 
>>> > significant drawbacks in doing this from Beam's point of view. One 
>>> > concern is whether avro and fastavro can safely co-exist in the same 
>>> > environment so that Beam continues to work for users who 

Re: Quieten javadoc generation

2019-04-02 Thread Alexey Romanenko
+1 to suppress such warnings globally. IMO, usually, meaningful Javadoc 
description is quite enough to understand what this method does.

> On 1 Apr 2019, at 18:21, Kenneth Knowles  wrote:
> 
> Personally, I would like to suppress the warnings globally. I think requiring 
> javadoc everywhere is already enough to remind someone to write something 
> meaningful. And I think @param rarely adds anything beyond the function 
> signature and @return rarely adds anything beyond the description.
> 
> Kenn
> 
> On Mon, Apr 1, 2019 at 6:53 AM Michael Luckey  > wrote:
> Hi,
> 
> currently our console output gets cluttered by thousands of Javadoc warnings 
> [1]. Most of them are warnings caused by missinlng @return or @param tags  
> [2].
> 
> So currently, this signal is completely ignored, and even worse, makes it 
> difficult to parse through the log.
> 
> As I could not find a previous discussion on the list on how to handle 
> param/return on java docs, I felt the need to ask here first, how we would 
> like to improve this situation.
> 
> Some options
> 1. fix those warnings
> 2. do not insist on those tags being present and disable doclint warnings 
> (probably not doable on tag granularity). This is already done on doc 
> aggregation task [3]
> 
> Thoughts?
> 
> 
> [1] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/console 
> 
> [2] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/java/ 
> 
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle#L77-L78
>  
> 
> 



Re: Deprecating Avro for fastavro on Python 3

2019-04-02 Thread Robbe Sneyders
Hi all,

Thank you for the feedback. Looking at the responses, it seems like there
is a consensus to move forward with fastavro as the default implementation
on Python 3.

There are 2 questions left however:
- Should fastavro also become the default implementation on Python 2?
This is a trade-off between having a consistent API across Python versions,
or keeping the current behavior on Python 2.

- Should we keep the avro-python3 dependency?
With the proposed solution, we could remove the avro-python3 dependency,
but it might have to be re-added if we want to support Avro again on Python
3 in a future version.

Kind regards,
Robbe

[image: https://ml6.eu] 

* Robbe Sneyders*

ML6 Gent


M: +32 474 71 31 08


On Thu, 28 Mar 2019 at 18:28, Ahmet Altay  wrote:

> Hi Ismaël,
>
> It is great to hear that Avro is planning to make a release soon.
>
> To answer your concerns, fastavro has a set of tests using regular avro
> files[1] and it also has a large set of users (with 675470 package
> downloads). This is in addition to it being a py2 & py3 compatible package
> and offering ~7x performance improvements [2]. Another data point, we were
> testing fastavro for a while behind an experimental flag and have not seen
> issues related compatibility.
>
> pyavro-rs sounds promising however I could not find a released version of
> it on pypi. The source code does not look like being maintained either with
> last commit on Jul 2, 2018. (for comparison last change on fastavro was on
> Mar 19, 2019).
>
> I think given the state of things, it makes sense to switch to fastavro as
> the default implementation to unblock python 3 changes. When avro offers a
> similar level of performance we could switch back without any visible user
> impact.
>
> Ahmet
>
> [1] https://github.com/fastavro/fastavro/tree/master/tests
> [2] https://pypi.org/project/fastavro/
>
> On Thu, Mar 28, 2019 at 7:53 AM Ismaël Mejía  wrote:
>
>> Hello,
>>
>> The problem of switching implementations is the risk of losing
>> interoperability, and this is more important than performance. Does
>> fastavro have tests that guarantee that it is fully compatible with
>> Avro’s Java version? (given that it is the de-facto implementation
>> used everywhere).
>>
>> If performance is a more important criteria maybe it is worth to check
>> at pyavro-rs [1], you can take a look at its performance in the great
>> talk of last year [2].
>>
>> I have been involved actively in the Avro community in the last months
>> and I am now a committer there. Also Dan Kulp who has done multiple
>> contributions in Beam is now a PMC member too. We are at this point
>> working hard to get the next release of Avro out, actually the branch
>> cut of Avro 1.9.0 is happening this week, and we plan to improve the
>> release cadence. Please understand that the issue with Avro is that it
>> is a really specific and ‘old‘ project (~10 years) so part of the
>> active moved to other areas because it is stable, but we are still
>> there working on it and we are eager to improve it for everyone’s
>> needs (and of course Beam needs).
>>
>> I know that Python 3’s Avro implementation is still lacking and could
>> be improved (views expressed here are clearly valid), but maybe this
>> is a chance to contribute there too. Remember Apache projects are a
>> family and we have a history of cross colaboration with other
>> communities e.g. Flink, Calcite so why not give it a chance to Avro
>> too.
>>
>> Regards,
>> Ismaël
>>
>> [1] https://github.com/flavray/pyavro-rs
>> [2]
>> https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf
>>
>> On Wed, Mar 27, 2019 at 11:42 PM Chamikara Jayalath
>>  wrote:
>> >
>> > +1 for making use_fastavro the default for Python3. I don't see any
>> significant drawbacks in doing this from Beam's point of view. One concern
>> is whether avro and fastavro can safely co-exist in the same environment so
>> that Beam continues to work for users who already have avro library
>> installed.
>> >
>> > Note that there are two use_fastavro flags (confusingly enough).
>> > (1) for avro file source [1]
>> > (2) an experiment flag [2] with the same name that makes Dataflow
>> runner use fastavro library for reading/writing intermediate files and for
>> reading Avro files exported by BigQuery.
>> >
>> > I can help with the latter.
>> >
>> > [1]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py#L81
>> > [2]
>> https://lists.apache.org/thread.html/94bd362a3a041654e6ef9003fb3fa797e25274fdb4766065481a0796@%3Cuser.beam.apache.org%3E
>> >
>> > Thanks,
>> > Cham
>> >
>> > On Wed, Mar 27, 2019 at 3:27 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>> >>
>> >> Thanks, Robbe and Frederik, for raising this.
>> >>