[PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
Hi guys,

I would like to introduce a Gradle "meta" project for the build:
beam-sdks-java.

The idea is to simply build all Java SDK related resources (core, IO, ...).

The purpose is also to be aligned with the other SDKs which provide
beam-sdks-go and beam-sdks-python.

Thoughts ?

Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
Hi Michael,

Yes, I know the -p option and it's currently what I'm using.

However the proposal is also in order to have some more "consistent"
with other modules.

Regards
JB

On 01/04/2019 18:49, Michael Luckey wrote:
> Hi,
> 
> although I did not yet manage to get deeper involved into actual
> development, I think this ability would be a useful addition.
> 
> But I would also like to point out, that this is kind of implicit, as
> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> 
> For instance, we would change the current setup from
> 
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = file("sdks/java/core")
> 
> to something like
> 
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
> 
> 
> With this in place a plain
> 
> $ ./gradlew -p sdks/java build
> 
> 
> would exactly do what you want. And, of course, this will also work for
> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> 
> Currently, we deviate from gradle default convention and therefore have
> to implement some quirks to restore default behaviour. And I somehow
> dislike the structure introduced by parent/child folders, which will be
> destroyed by our current project definitions.
> 
> But, to be honest, although I have some clear understanding on how to
> proceed here - especially regarding the requirement to keep the change
> backwards compatible - we might decide not to switch. Because deeper
> investigation might reveal issues, which I am currently not aware of.
> 
> Best,
> 
> michel
> 
> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré  <mailto:j...@nanthrax.net>> wrote:
> 
> Hi guys,
> 
> I would like to introduce a Gradle "meta" project for the build:
> beam-sdks-java.
> 
>     The idea is to simply build all Java SDK related resources (core,
> IO, ...).
> 
> The purpose is also to be aligned with the other SDKs which provide
> beam-sdks-go and beam-sdks-python.
> 
> Thoughts ?
> 
> Regards
> JB
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org <mailto:jbono...@apache.org>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
By the way, another reason is to have this clearly displayed in
./gradlew projects ;)

On 01/04/2019 18:49, Michael Luckey wrote:
> Hi,
> 
> although I did not yet manage to get deeper involved into actual
> development, I think this ability would be a useful addition.
> 
> But I would also like to point out, that this is kind of implicit, as
> soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> 
> For instance, we would change the current setup from
> 
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = file("sdks/java/core")
> 
> to something like
> 
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
> 
> 
> With this in place a plain
> 
> $ ./gradlew -p sdks/java build
> 
> 
> would exactly do what you want. And, of course, this will also work for
> 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> 
> Currently, we deviate from gradle default convention and therefore have
> to implement some quirks to restore default behaviour. And I somehow
> dislike the structure introduced by parent/child folders, which will be
> destroyed by our current project definitions.
> 
> But, to be honest, although I have some clear understanding on how to
> proceed here - especially regarding the requirement to keep the change
> backwards compatible - we might decide not to switch. Because deeper
> investigation might reveal issues, which I am currently not aware of.
> 
> Best,
> 
> michel
> 
> On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré  <mailto:j...@nanthrax.net>> wrote:
> 
> Hi guys,
> 
> I would like to introduce a Gradle "meta" project for the build:
> beam-sdks-java.
> 
> The idea is to simply build all Java SDK related resources (core,
> IO, ...).
> 
> The purpose is also to be aligned with the other SDKs which provide
> beam-sdks-go and beam-sdks-python.
> 
> Thoughts ?
> 
> Regards
> JB
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org <mailto:jbono...@apache.org>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: kafka 0.9 support

2019-04-01 Thread Jean-Baptiste Onofré
+1 to remove 0.9 support.

I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)

Regards
JB

On 01/04/2019 19:36, David Morávek wrote:
> Hello,
> 
> is there still a reason to keep Kafka 0.9 support? This unfortunately
> adds lot of complexity to KafkaIO implementation.
> 
> Kafka 0.9 was released on Nov 2015.
> 
> My first shot on removing Kafka 0.9 support would remove second
> consumer, which is used for fetching offsets.
> 
> WDYT? Is this support worth keeping?
> 
> https://github.com/apache/beam/pull/8186
> 
> D.

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Jean-Baptiste Onofré
Hi,

I mean that I did change on a local branch to be able to do:

./gradlew :beam-sdks-java:build

and/or

./gradlew -p sdks/java build

Regards
JB

On 01/04/2019 19:47, Michael Luckey wrote:
> Hmm... now you lost me :(
> 
> Currently I am not able to do a
> 
> $./gradlew -p sdks/java build
> It fails with error
> 
> Project directory '/Users/michel/GitHub/adude3141/beam/sdks/java' is not
> part of the build defined by settings file
> 
> 
> on my machine, which - again - should be expected.
> 
> Regarding the display, it would look like this if we would be able to switch
> 
> \--- Project ':sdks'
> 
>      +--- Project ':sdks:java'
> 
>      |    +--- Project ':sdks:java:core'- Apache Beam :: SDKs :: Java ::
> Core
> 
>      |    \--- Project ':sdks:java:extensions'
> 
>      |         \--- Project ':sdks:java:extensions:sql'- Apache Beam ::
> SDKs :: Java :: Extensions :: SQL
> 
>      \--- Project ':sdks:python'
> 
> 
> 
> On Mon, Apr 1, 2019 at 7:36 PM Jean-Baptiste Onofré  <mailto:j...@nanthrax.net>> wrote:
> 
> By the way, another reason is to have this clearly displayed in
> ./gradlew projects ;)
> 
> On 01/04/2019 18:49, Michael Luckey wrote:
> > Hi,
> >
> > although I did not yet manage to get deeper involved into actual
> > development, I think this ability would be a useful addition.
> >
> > But I would also like to point out, that this is kind of implicit, as
> > soon we get https://issues.apache.org/jira/browse/BEAM-4046 included.
> >
> > For instance, we would change the current setup from
> >
> > include "beam-sdks-java-core"
> > project(":beam-sdks-java-core").dir = file("sdks/java/core")
> >
> > to something like
> >
> > include(":sdks:java:core")
> > include(":sdks:java:extensions:sql")
> > include(":sdks:python")
> >
> >
> > With this in place a plain
> >
> > $ ./gradlew -p sdks/java build
> >
> >
> > would exactly do what you want. And, of course, this will also
> work for
> > 'sdks/java/io', 'runners/' etc. Hope, you get the point.
> >
> > Currently, we deviate from gradle default convention and therefore
> have
> > to implement some quirks to restore default behaviour. And I somehow
> > dislike the structure introduced by parent/child folders, which
> will be
>     > destroyed by our current project definitions.
> >
> > But, to be honest, although I have some clear understanding on how to
> > proceed here - especially regarding the requirement to keep the change
> > backwards compatible - we might decide not to switch. Because deeper
> > investigation might reveal issues, which I am currently not aware of.
> >
> > Best,
> >
> > michel
> >
> > On Mon, Apr 1, 2019 at 5:52 PM Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>
> > <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
> >
> >     Hi guys,
> >
> >     I would like to introduce a Gradle "meta" project for the build:
> >     beam-sdks-java.
> >
> >     The idea is to simply build all Java SDK related resources (core,
> >     IO, ...).
> >
> >     The purpose is also to be aligned with the other SDKs which
> provide
> >     beam-sdks-go and beam-sdks-python.
> >
> >     Thoughts ?
> >
> >     Regards
> >     JB
> >     --
> >     Jean-Baptiste Onofré
> >     jbono...@apache.org <mailto:jbono...@apache.org>
> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org <mailto:jbono...@apache.org>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-02 Thread Jean-Baptiste Onofré
> configuring the MavenPublication."
> 
> 
> During the gradle migration this wasn't that easy. The
> new maven publish plugin improved a lot since then.
>  
> 
> Using the default at the time also broke the
> artifact names for intra project dependencies
> that we generate[1]. Finally, we also ran into
> an issue because we had more then one Gradle
> project with the same directory name even though
> they were under a different parent folder (I
> think it was "core") and that was leading to
> some strange build time behavior.
> 
> 
> Weird. But I think the Jira should still stand as a
> move towards simplifying our build and making it
> more discoverable for new contributors.
> 
>  
> Agree on the JIRA makes sense, just calling out that
> there were other issues that this naming had caused in
> the past which should be checked before we call this done.
> 
> 
> Totally agree. It will be quite a large task with a lot of
> boilerplate that might not be separable from technical
> blockers that come up as you go through the boilerplate.
> 
> Kenn
>  
>  
> 
> Kenn 
>  
> 
> We didn't migrate to a flat project structure
> where each project is a folder underneath the
> root project because of the existing Maven build
> rules that were being maintained in parallel and
> I'm not sure if people would want to have a flat
> project structure either.
> 
> 1: 
> https://github.com/apache/beam/blob/a85ea07b719385ec185e4fc5e4cdcc67b3598599/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1055
> 
> On Mon, Apr 1, 2019 at 9:49 AM Michael Luckey
>  <mailto:adude3...@gmail.com>> wrote:
> 
> Hi,
> 
> although I did not yet manage to get deeper
> involved into actual development, I think
> this ability would be a useful addition.
> 
> But I would also like to point out, that
> this is kind of implicit, as soon we
> get 
> https://issues.apache.org/jira/browse/BEAM-4046
> included.
> 
> For instance, we would change the current
> setup from
> 
> include "beam-sdks-java-core"
> project(":beam-sdks-java-core").dir = 
> file("sdks/java/core")
> 
> to something like
> 
> include(":sdks:java:core")
> include(":sdks:java:extensions:sql")
> include(":sdks:python")
> 
> 
> With this in place a plain
> 
> $ ./gradlew -p sdks/java build
> 
> 
> would exactly do what you want. And, of
> course, this will also work for
> 'sdks/java/io', 'runners/' etc. Hope, you
> get the point.
> 
> Currently, we deviate from gradle default
> convention and therefore have to implement
> some quirks to restore default behaviour.
>     And I somehow dislike the structure
> introduced by parent/child folders, which
> will be destroyed by our current project
> definitions.
> 
> But, to be honest, although I have some
> clear understanding on how to proceed here -
> especially regarding the requirement to keep
> the change backwards compatible - we might
> decide not to swit

Re: [review?] WordCount in Kotlin

2019-04-04 Thread Jean-Baptiste Onofré
Thanks for the update Pablo.

I will try to take a look during the week end.

Regards
JB

On 04/04/2019 23:16, Pablo Estrada wrote:
> Hello all,
> as community member has been very kind to contribute a Kotlin
> translation of the WordCount pipeline[1]. The documentation, tests, and
> gradle structure for it is very good, so I am happy to merge, but since
> this code will become our first Kotlin "documentation"/entrypoint, I
> wanted to be cautious.
> So if anyone wants to take a look to review the change, please do. I
> will merge this in a couple days.
> Thanks!
> -P.
> 
> [1] https://github.com/apache/beam/pull/8034

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-17 Thread Jean-Baptiste Onofré
+1 (binding)

Quickly checked with beam-samples.

Regards
JB

On 16/04/2019 00:50, Andrew Pilloud wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #4 for the version
> 2.12.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> <http://dist.apache.org> [2], which is signed with the key with
> fingerprint 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.12.0-RC4" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Gradle/5.2.1 and OpenJDK/Oracle JDK
> 1.8.0_181.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org <http://dist.apache.org> [2].
> * Validation sheet with a tab for 2.12.0 release to help with validation
> [9].
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Andrew
> 
> 1] 
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344944
> [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1068/
> [5] https://github.com/apache/beam/tree/v2.12.0-RC4 
> [6] https://github.com/apache/beam/pull/8215
> [7] https://github.com/apache/beam-site/pull/588
> [8] https://github.com/apache/beam/pull/8314
> [9] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: New contributor to Beam

2019-04-17 Thread Jean-Baptiste Onofré
Welcome !

Regards
JB

On 17/04/2019 16:05, Cyrus Maden wrote:
> Hi all!
> 
> My name's Cyrus and I'd like to start contributing to Beam. I'm a
> technical writer so I'm particularly looking forward to contributing to
> the Beam docs. Could someone add me as a contributor on JIRA so I can
> create and assign tickets?
> 
> My JIRA name is: *cyrusmaden*
> *
> *
> Excited to be a part of this community and to work with ya'll!
> 
> Best,
> Cyrus

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Wait on JdbcIO write completion

2019-04-17 Thread Jean-Baptiste Onofré
I second Alexey (and thanks Alexey ;)).

I also started similar improvements in other IOs (PRs will come soon).

Regards
JB

On 17/04/2019 17:31, Alexey Romanenko wrote:
> Hi Jonathan,
> 
> I just wanted to let you know that this feature [1] was implemented and,
> finally, merged into master. So, it should be included into next Beam
> 2.13 release.
> 
> In few words, it was added new method called “/Write.withResults()/”
> which returns /WriteVoid/ transform that provides “/PCollection/”
> as an output and can be used together with "/Wait.on()"/. So, the simple
> example of writing into two different databases can look like this:
> 
> /PCollection firstWriteResults = data.apply(JdbcIO.write()
>     .withDataSourceConfiguration(CONF_DB_1).withResults());
> data.apply(Wait.on(firstWriteResults))
>     .apply(JdbcIO.write().withDataSourceConfiguration(CONF_DB_2));/
> 
> [1] https://issues.apache.org/jira/browse/BEAM-6732
> 
>> On 22 Feb 2019, at 16:52, Alexey Romanenko > <mailto:aromanenko@gmail.com>> wrote:
>>
>> I have created new Jira issue for this feature:
>> https://issues.apache.org/jira/browse/BEAM-6732
>>
>> Jonathan, feel free to assign it to yourself if you want to
>> contribute, it is always welcomed =)
>>
>>> On 21 Feb 2019, at 10:23, Jonathan Perron
>>> mailto:jonathan.per...@lumapps.com>> wrote:
>>>
>>> Thank you Eugene for your answer.
>>>
>>> According to your explanation, I think I will go with your 3rd
>>> solution, as this seems the most robust and friendly way to act.
>>>
>>> Jonathan
>>>
>>> On 21/02/2019 02:22, Eugene Kirpichov wrote:
>>>> Hi Jonathan,
>>>>
>>>> Wait.on() requires a PCollection - it is not possible to change it
>>>> to wait on PDone because all PDone's in the pipeline are the same so
>>>> it's not clear what exactly you'd be waiting on.
>>>>
>>>> To use the Wait transform with JdbcIO.write(), you would need to
>>>> change 
>>>> https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L761-L762
>>>>  to
>>>> simply "return input.apply(ParDo.of(...))" and propagate that into
>>>> the type signature. Then you'd get a waitable PCollection.
>>>>
>>>> This is a very simple, but backwards-incompatible change. Up to the
>>>> Beam community whether/when people would want to make it.
>>>>
>>>> It's also possible to make a slightly larger but compatible change,
>>>> where JdbcIO.write() would stay as is, but you could write e.g.
>>>> "JdbcIO.write().withResults()" which would be a new transform that
>>>> *does* return results and is waitable. A similar approach is taken
>>>> in TextIO.write().withOutputFilenames().
>>>>
>>>> On Wed, Feb 20, 2019 at 4:58 AM Jonathan Perron
>>>> mailto:jonathan.per...@lumapps.com>>
>>>> wrote:
>>>>
>>>> Hello folks,
>>>>
>>>> I am meeting a special case where I need to wait for a
>>>> JdbcIO.write()
>>>> operation to be complete to start a second one.
>>>>
>>>> In the details, I have a PCollection> which
>>>> is used
>>>> to fill two different SQL statement. It is used in a first
>>>> JdbcIO.write() operation to store anonymized user in a table
>>>> (userId
>>>> with an associated userUuid generated with UUID.randomUUID()).
>>>> These two
>>>> parameters have a unique constraint, meaning that a userId
>>>> cannot have
>>>> multiple userUuid. Unfortunately, on several runs of my
>>>> pipeline, the
>>>> UUID will be different, meaning that I need to query this table
>>>> at some
>>>> point, or to use what I describe in the following.
>>>>
>>>> I am planning to fill a second table with this userUuid with a
>>>> couple of
>>>> others information such as the time of first visit. To limit I/O
>>>> and as
>>>> I got a lot of information in my PCollection, I want to use it
>>>> once more
>>>> with a different SQL statement, where the userUuid is read from the
>>>> first table using a SELECT statement. This cannot work if the first
>>>> JdbcIO.write() operation is not complete.
>>>>
>>>> I saw that the Java SDK proposes a Wait.on() PTransform, but it is
>>>> unfortunately only compatible with PCollection, and not a PDone
>>>> such as
>>>> the one output from the JdbcIO operation. Could my issue be
>>>> solved by
>>>> expanding the Wait.On() or should I go with an other solution ?
>>>> If so,
>>>> how could I implement it ?
>>>>
>>>> Many thanks for your input !
>>>>
>>>> Jonathan
>>>>
>>
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: CassandraIO breakage

2019-04-18 Thread Jean-Baptiste Onofré
Let me check if it works on my machine.

Regards
JB

On 17/04/2019 21:48, Reuven Lax wrote:
> Did something break with CassandraIO? It no longer seems to compile.

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: CassandraIO breakage

2019-04-18 Thread Jean-Baptiste Onofré
It builds fine on my machine.

Let me check on Jenkins.

Regards
JB

On 17/04/2019 21:48, Reuven Lax wrote:
> Did something break with CassandraIO? It no longer seems to compile.

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Prepare for LTS bugfix release 2.7.1

2019-04-25 Thread Jean-Baptiste Onofré
+1 it sounds good to me.

Thanks !

Regards
JB

On 26/04/2019 02:42, Kenneth Knowles wrote:
> Hi all,
> 
> Since the release of 2.7.0 we have identified some serious bugs:
> 
>  - There are 8 (non-dupe) issues* tagged with Fix Version 2.7.1
>  - 2 are rated "Blocker" (aka P0) but I think the others may be underrated
>  - If you know of a critical bug that is not on that list, please file
> an LTS backport ticket for it
> 
> If a user is on an old version and wants to move to the LTS, there are
> some real blockers. I propose that we perform a 2.7.1 release starting now.
> 
> I volunteer to manage the release. What do you think?
> 
> Kenn
> 
> *Some are "resolved" but this is not accurate as the LTS 2.7.1 branch is
> not created yet. I suggest filing a ticket to track just the LTS
> backport when you hit a bug that merits it.
> 


Re: Hello from Hannah Jiang

2019-04-25 Thread Jean-Baptiste Onofré
Welcome aboard !

Regards
JB

On 25/04/2019 19:55, Griselda Cuevas wrote:
> Welcome Hannah! - Very excited to see you in the Beam community :) 
> 
> On Tue, 23 Apr 2019 at 12:59, Hannah Jiang  > wrote:
> 
> Hi everyone
> 
> I joined Google recently and would work on Python portability part.
> I am happy to be part of the community. Looking forward to working
> with all of you together.
> 
> I have a minor request, can admin please give me access to JIRA?
> 
> Thanks,
> Hannah
> 
> 


Re: Add new JIRA component for Python IO?

2019-04-25 Thread Jean-Baptiste Onofré
It sounds good. +1

Regards
JB

On 25/04/2019 21:43, Pablo Estrada wrote:
> Hello all,
> there are only two JIRA components for python: `sdk-py-core`, and
> `sdk-py-harness`. Naturally, sdk-py-core is the component with the most
> bugs (>1000).
> 
> I believe we really need a component to route issues for Python IO
> (`io-python-all`?). Maybe even something more granular.
> 
> Thoughts?
> -P.
> 
> 


Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-04-29 Thread Jean-Baptiste Onofré
Agree, +1

Regards
JB

On 29/04/2019 15:30, Ismaël Mejía wrote:
> +1 to remove it on this release, this is a maintenance pain for no real 
> reason.
> 
> On Mon, Apr 29, 2019 at 3:06 PM Alexey Romanenko
>  wrote:
>>
>> Despite the fact that after fixing an issue with ports allocation (thanks to 
>> Etienne!) for embedded Cassandra cluster (it’s used in hadoop-input-format 
>> and this was the main cause of flakiness) it's got much better, I’m 100% pro 
>> to remove this module since it’s already been deprecated for several last 
>> releases.
>>
>> PS: Just an observation when I was digging into PreCommit jobs results - 
>> “org.apache.beam.runners.dataflow.worker.fn.BeamFnControlServiceTest.testClientConnecting”
>>  fails quite often in the last time. Anyone works on this?
>>
>>> On 29 Apr 2019, at 14:19, Maximilian Michels  wrote:
>>>
>>> I don't know what going on with it but I agree it's annoying.
>>>
>>> Came across https://jira.apache.org/jira/browse/BEAM-6247, maybe it is time 
>>> to remove this module for the next release?
>>>
>>> -Max
>>>
>>> On 26.04.19 20:10, Reuven Lax wrote:
>>>> I find I usually have to rerun Presubmit multiple times to get a green 
>>>> run, and this test is one of the biggest culprits (though it's not the 
>>>> only culprit). Does anyone know what's going on with it?
>>>> Reuven
>>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: request for beam minor release

2019-05-08 Thread Jean-Baptiste Onofré
Hi,

Any release are tagging. We create a branch based on a master commit.

Are you requesting 2.10.1 maintenance release ?

Regards
JB

On 08/05/2019 15:10, Moorhead,Richard wrote:
> Is there a process for tagging a commit in master for a minor release?
> 
> I am trying to get this
> <https://github.com/apache/beam/pull/8503/commits/ffa5632bca8c7264993702c39c6ca013a9f6ecdb>
>  commit
> released into 2.10.1
>  
> 
> CONFIDENTIALITY NOTICE This message and any included attachments are
> from Cerner Corporation and are intended only for the addressee. The
> information contained in this message is confidential and may constitute
> inside or non-public information under international, federal, or state
> securities laws. Unauthorized forwarding, printing, copying,
> distribution, or use of such information is strictly prohibited and may
> be unlawful. If you are not the addressee, please promptly delete this
> message and notify the sender of the delivery error by e-mail or you may
> call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)
> (816)221-1024.
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: request for beam minor release

2019-05-08 Thread Jean-Baptiste Onofré
I second Max here. If you are just looking for this specific commit, you
can take a next release that will include it.

Regards
JB

On 08/05/2019 16:27, Maximilian Michels wrote:
> Hi Richard,
> 
> Would it be an option to use the upcoming 2.13.0 release? The commit
> will be part of that release.
> 
> Thanks,
> Max
> 
> On 08.05.19 15:43, Jean-Baptiste Onofré wrote:
>> Hi,
>>
>> Any release are tagging. We create a branch based on a master commit.
>>
>> Are you requesting 2.10.1 maintenance release ?
>>
>> Regards
>> JB
>>
>> On 08/05/2019 15:10, Moorhead,Richard wrote:
>>> Is there a process for tagging a commit in master for a minor release?
>>>
>>> I am trying to get this
>>> <https://github.com/apache/beam/pull/8503/commits/ffa5632bca8c7264993702c39c6ca013a9f6ecdb>
>>>  commit
>>>
>>> released into 2.10.1
>>>  
>>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>> from Cerner Corporation and are intended only for the addressee. The
>>> information contained in this message is confidential and may constitute
>>> inside or non-public information under international, federal, or state
>>> securities laws. Unauthorized forwarding, printing, copying,
>>> distribution, or use of such information is strictly prohibited and may
>>> be unlawful. If you are not the addressee, please promptly delete this
>>> message and notify the sender of the delivery error by e-mail or you may
>>> call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)
>>> (816)221-1024.
>>>
>>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra and hadoop test broken on master and in previous releases

2019-05-10 Thread Jean-Baptiste Onofré
Hi,

let me try to reproduce on my box.

Regards
JB

On 11/05/2019 01:34, Ankur Goenka wrote:
> Hi,
> 
> Cassandra and Hadoop tests for targets :beam-sdks-java-io-cassandra:test
> :beam-sdks-java-io-hadoop-format:test are failing at master and in
> 2.12.0 release with jvm crash. 
> 
> Gradle Scan: https://gradle.com/s/rhseoqeouup6e
> 
> Any help on the debugging failure will be useful.
> 
> Thanks,
> Ankur

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra and hadoop test broken on master and in previous releases

2019-05-11 Thread Jean-Baptiste Onofré
Hi,

It works fine on my machine. I'm using this JDK:

java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)

Regarding your Gradle scan, the JVM is crashing.

Do you have a chance to try another JDK ?

I will try with OpenJDK 1.8.0_181.

Regards
JB

On 11/05/2019 01:34, Ankur Goenka wrote:
> Hi,
> 
> Cassandra and Hadoop tests for targets :beam-sdks-java-io-cassandra:test
> :beam-sdks-java-io-hadoop-format:test are failing at master and in
> 2.12.0 release with jvm crash. 
> 
> Gradle Scan: https://gradle.com/s/rhseoqeouup6e
> 
> Any help on the debugging failure will be useful.
> 
> Thanks,
> Ankur

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Release 2.13.0, release candidate #2

2019-06-03 Thread Jean-Baptiste Onofré
+1 (binding)

Quickly tested on beam-samples.

Regards
JB

On 31/05/2019 04:52, Ankur Goenka wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version
> 2.13.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> <http://dist.apache.org> [2], which is signed with the key with
> fingerprint 6356C1A9F089B0FA3DE8753688934A6699985948 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.13.0-RC2" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org <http://dist.apache.org> [2].
> * Validation sheet with a tab for 2.13.0 release to help with validation
> [8].
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Ankur
> 
> [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12345166
> [2] https://dist.apache.org/repos/dist/dev/beam/2.13.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1070/
> [5] https://github.com/apache/beam/tree/v2.13.0-RC2
> [6] https://github.com/apache/beam/pull/8645
> [7] https://github.com/apache/beam-site/pull/589
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1031196952

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Preparing for Beam 2.14.0 release

2019-06-06 Thread Jean-Baptiste Onofré
+1

Regards
JB

Le 6 juin 2019 à 19:02, à 19:02, Ankur Goenka  a écrit:
>+1
>
>On Thu, Jun 6, 2019, 9:13 AM Ahmet Altay  wrote:
>
>> +1, thank you for keeping the cadence.
>>
>> On Thu, Jun 6, 2019 at 9:04 AM Anton Kedin  wrote:
>>
>>> Hello Beam community!
>>>
>>> Beam 2.14 release branch cut date is June 19 according to the
>release
>>> calendar [1]. I would like to volunteer myself to do this release.
>The plan
>>> is to cut the branch on that date, and cherrypick fixes if needed.
>>>
>>> If you have release blocking issues for 2.14 please mark their "Fix
>>> Version" as 2.14.0 [2]. Please use 2.15.0 release in JIRA in case
>you
>>> would like to move any non-blocking issues to that version.
>>>
>>> And if we're doing a 2.7.1 release it should probably happen
>>> independently and in parallel if we want to maintain the release
>cadence.
>>>
>>> Thoughts, comments, objections?
>>>
>>> Thanks,
>>> Anton
>>>
>>> [1]
>>>
>https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>> [2]
>>>
>https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%202.14.0
>>>
>>


Re: WebSocket/Https connector for Apache Beam (Java)?

2019-07-02 Thread Jean-Baptiste Onofré
Hi,

I have a websocket and REST IO about that (I proposed the IO while ago).

As you are not the first one to ask for such IO, I will revive the IO.

Regards
JB

On 02/07/2019 13:34, I-Feng Lin wrote:
> Hello all,
> 
> I have an Apache Beam pipeline in Java where I would like to read data
> that comes from a WebSocket and write data to the server through Https.
> 
> I have been looking for some connectors but so far my search was
> unsuccessful. I know that it is possible to create custom connectors but
> I want to check if there is anything already exists.
> 
> Thanks in advance,
> 
> Ifeng
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Query about JdbcIO.readRows()

2019-08-02 Thread Jean-Baptiste Onofré
Agree. I will fix that.

Regards
JB

Le 2 août 2019 à 17:15, à 17:15, Vishwas Bm  a écrit:
>Hi Kishor,
>
>+ dev (dev@beam.apache.org)
>
>This looks like a bug.  The attribute statementPreparator is nullable
>It should have been handled in the same way as in the expand method of
>Read class.
>
>
>*Thanks & Regards,*
>
>*Vishwas *
>
>
>On Fri, Aug 2, 2019 at 2:48 PM Kishor Joshi  wrote:
>
>> Hi,
>>
>> I am using the just released 2.14 version for JdbcIO with the newly
>added
>> "readRows" functionality.
>>
>> I want to read table data with a query without parameters (select *
>from
>> table_name).
>> As per my understanding, this should not require
>"StatementPreperator".
>> However, if I use the newly added "readRows" function, I get an
>exception
>> that seems to force me to use the "StatementPreperator".
>> Stacktrace below.
>>
>> java.lang.IllegalArgumentException: statementPreparator can not be
>null
>> at
>>
>org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
>> at
>>
>org.apache.beam.sdk.io.jdbc.JdbcIO$Read.withStatementPreparator(JdbcIO.java:600)
>> at
>> org.apache.beam.sdk.io.jdbc.JdbcIO$ReadRows.expand(JdbcIO.java:499)
>> at
>> org.apache.beam.sdk.io.jdbc.JdbcIO$ReadRows.expand(JdbcIO.java:410)
>> at
>org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>> at
>org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:471)
>> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:44)
>> at
>>
>com.nokia.csf.dfle.transforms.DfleRdbmsSource.expand(DfleRdbmsSource.java:34)
>> at
>>
>com.nokia.csf.dfle.transforms.DfleRdbmsSource.expand(DfleRdbmsSource.java:10)
>> at
>org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>> at
>org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56)
>> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:182)
>> at
>> com.nokia.csf.dfle.dsl.DFLEBeamMain.dagWireUp(DFLEBeamMain.java:49)
>> at
>com.nokia.csf.dfle.dsl.DFLEBeamMain.main(DFLEBeamMain.java:120)
>>
>>
>>
>> The test added in JdbcIOTest.java for this functionality only tests
>for
>> queries with parameters.
>> Is this new function supported only in the above case and not for
>normal
>> "withQuery" (without parameters) ?
>>
>>
>> Thanks & regards,
>> Kishor
>>


Re: Cassandra flaky on Jenkins?

2019-09-03 Thread Jean-Baptiste Onofré
Hi,

Let me take a look. Do you always have this issue on Jenkins or randomly ?

Regards
JB

On 03/09/2019 14:19, Alex Van Boxel wrote:
> Hi, is it only me that are bumping on the flaky Cassandra on Jenkins? I
> like to get my PR approved but I can't get past the Cassandra error...
> 
>   * org.apache.beam.sdk.io.cassandra.CassandraIOTest.classMethod
> 
> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/>
> 
> 
> 
>  _/
> _/ Alex Van Boxel

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra flaky on Jenkins?

2019-09-03 Thread Jean-Baptiste Onofré
Thanks David,

the build is running on my machine to see if I can reproduce locally.

It sounds like https://issues.apache.org/jira/browse/BEAM-7355 right ?

Regards
JB

On 03/09/2019 15:11, David Morávek wrote:
> I’m running into these failures too
> 
> D.
> 
> Sent from my iPhone
> 
>> On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré  wrote:
>>
>> Hi,
>>
>> Let me take a look. Do you always have this issue on Jenkins or randomly ?
>>
>> Regards
>> JB
>>
>>> On 03/09/2019 14:19, Alex Van Boxel wrote:
>>> Hi, is it only me that are bumping on the flaky Cassandra on Jenkins? I
>>> like to get my PR approved but I can't get past the Cassandra error...
>>>
>>>  * org.apache.beam.sdk.io.cassandra.CassandraIOTest.classMethod
>>>
>>> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/>
>>>
>>>
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>
>> -- 
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra flaky on Jenkins?

2019-09-03 Thread Jean-Baptiste Onofré
Hi Max,

yup, I'm starting the investigation.

I keep you posted.

Regards
JB

On 03/09/2019 15:34, Maximilian Michels wrote:
> The newest incarnation of this is here:
> https://jira.apache.org/jira/browse/BEAM-8025
> 
> Would be good if you could take a look JB.
> 
> Thanks,
> Max
> 
> On 03.09.19 15:32, David Morávek wrote:
>> yes, that looks similar. example:
>>
>> https://github.com/apache/beam/pull/9464
>>
>> D.
>>
>> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré > <mailto:j...@nanthrax.net>> wrote:
>>
>>> Thanks David,
>>>
>>> the build is running on my machine to see if I can reproduce locally.
>>>
>>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355 right ?
>>>
>>> Regards
>>> JB
>>>
>>> On 03/09/2019 15:11, David Morávek wrote:
>>>> I’m running into these failures too
>>>>
>>>> D.
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré >>>> <mailto:j...@nanthrax.net>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Let me take a look. Do you always have this issue on Jenkins or
>>>>> randomly ?
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>> On 03/09/2019 14:19, Alex Van Boxel wrote:
>>>>>> Hi, is it only me that are bumping on the flaky Cassandra on
>>>>>> Jenkins? I
>>>>>> like to get my PR approved but I can't get past the Cassandra
>>>>>> error...
>>>>>>
>>>>>> * org.apache.beam.sdk.io.cassandra.CassandraIOTest.classMethod
>>>>>>   
>>>>>> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _/
>>>>>> _/ Alex Van Boxel
>>>>>
>>>>> -- 
>>>>> Jean-Baptiste Onofré
>>>>> jbono...@apache.org <mailto:jbono...@apache.org>
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>
>>> -- 
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org <mailto:jbono...@apache.org>
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra flaky on Jenkins?

2019-09-04 Thread Jean-Baptiste Onofré
Thanks David,

it makes sense, it gives me time to investigate and fix.

Regards
JB

On 04/09/2019 15:01, David Morávek wrote:
> Hi, temporarily disabling the test
> <https://github.com/apache/beam/pull/9470>, until BEAM-8025
> <https://jira.apache.org/jira/browse/BEAM-8025> is resolved (marking it
> as blocker for 2.16), so we can unblock ongoing pull requests.
> 
> Best,
> D.
> 
> On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré  <mailto:j...@nanthrax.net>> wrote:
> 
> Hi Max,
> 
> yup, I'm starting the investigation.
> 
> I keep you posted.
> 
> Regards
> JB
> 
> On 03/09/2019 15:34, Maximilian Michels wrote:
> > The newest incarnation of this is here:
> > https://jira.apache.org/jira/browse/BEAM-8025
> >
> > Would be good if you could take a look JB.
> >
> > Thanks,
> > Max
> >
> > On 03.09.19 15:32, David Morávek wrote:
> >> yes, that looks similar. example:
> >>
> >> https://github.com/apache/beam/pull/9464
> >>
> >> D.
> >>
> >> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré  <mailto:j...@nanthrax.net>
> >> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
> >>
> >>> Thanks David,
> >>>
> >>> the build is running on my machine to see if I can reproduce
> locally.
> >>>
> >>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355
> right ?
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 03/09/2019 15:11, David Morávek wrote:
> >>>> I’m running into these failures too
> >>>>
> >>>> D.
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré  <mailto:j...@nanthrax.net>
> >>>>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Let me take a look. Do you always have this issue on Jenkins or
> >>>>> randomly ?
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
>     >>>>>> On 03/09/2019 14:19, Alex Van Boxel wrote:
> >>>>>> Hi, is it only me that are bumping on the flaky Cassandra on
> >>>>>> Jenkins? I
> >>>>>> like to get my PR approved but I can't get past the Cassandra
> >>>>>> error...
> >>>>>>
> >>>>>> * org.apache.beam.sdk.io
> <http://org.apache.beam.sdk.io>.cassandra.CassandraIOTest.classMethod
> >>>>>>
>   
> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _/
> >>>>>> _/ Alex Van Boxel
> >>>>>
> >>>>> -- 
> >>>>> Jean-Baptiste Onofré
> >>>>> jbono...@apache.org <mailto:jbono...@apache.org>
> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
> >>>>> http://blog.nanthrax.net
> >>>>> Talend - http://www.talend.com
> >>>
> >>> -- 
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org <mailto:jbono...@apache.org>
> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org <mailto:jbono...@apache.org>
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Jira - Contributor

2019-09-05 Thread Jean-Baptiste Onofré
Done:

you are in the contributor list and the Jira is assigned to you.

Regards
JB

On 05/09/2019 10:31, Matthew Darwin wrote:
> Hi,
> 
> Could I please be added as a contributor on Jira so I can assign
> https://issues.apache.org/jira/browse/BEAM-8153 to me?
> 
> Kind regards
> 
> Matthew

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: MQTT to Python SDK

2019-09-16 Thread Jean-Baptiste Onofré
Regarding Java SDK, you have MqttIO available.

Regards
JB

On 16/09/2019 21:07, Lucas Magalhães wrote:
> Thanks Altay.. Do you know where I could find more about cross language
> transforms? Documentation and examples as well.
> 
> thanks again
> 
> On Mon, Sep 16, 2019 at 4:00 PM Ahmet Altay  <mailto:al...@google.com>> wrote:
> 
> A framework for python sdk to use a native unbounded connector does
> not exist yet. You might be able to use the same connector from Java
> using cross language transforms.
> 
> /cc +Chamikara Jayalath <mailto:chamik...@google.com>  
> 
> On Mon, Sep 16, 2019 at 11:00 AM Lucas Magalhães
>  <mailto:lucas.magalh...@paralelocs.com.br>> wrote:
> 
> Hello dears!
> 
> I'm starding a new project here and the mainly source is a MQTT.
> 
> I could´n find any documentantion about to How to develeop a
> unbounded connector.
> 
> Could anyone send me some instructions or guide line?
> 
> Thanks a lot
> 
> -- 
> Lucas Magalhães,
> CTO
> 
> Paralelo CS - Consultoria e Serviços
> Tel: +55 (11) 3090-5557 
> Cel: +55 (11) 99420-4667 
> lucas.magalh...@paralelocs.com.br
> <mailto:lucas.magalh...@paralelocs.com.br>
> 
> <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br
> <http://www.paralelocs.com.br>
> 
> 
> 
> -- 
> Lucas Magalhães,
> CTO
> 
> Paralelo CS - Consultoria e Serviços
> Tel: +55 (11) 3090-5557
> Cel: +55 (11) 99420-4667
> lucas.magalh...@paralelocs.com.br <mailto:lucas.magalh...@paralelocs.com.br>
> 
> <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br
> <http://www.paralelocs.com.br>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Cassandra flaky on Jenkins?

2019-09-19 Thread Jean-Baptiste Onofré
Hi Etienne,

let me take a look, I'm not sure.

Regards
JB

On 19/09/2019 16:42, Etienne Chauchot wrote:
> Hi all,
> I just created a PR (1) that tries to fix the flakiness of
> CassandraIOTest (underlying
> ticket https://jira.apache.org/jira/browse/BEAM-8025 that was assigned
> to me). We will see with the test repetitions if it is no more flaky.
> 
> JB, I don't know if my PR will also fix the ticket
> https://issues.apache.org/jira/browse/BEAM-7355 assigned to you, or if
> the tickets are the same/related. I hope it does.
> 
> 
> [1]
> <https://github.com/apache/beam/pull/9614>https://github.com/apache/beam/pull/9614
> 
> Best,
> Etienne
> Le mercredi 04 septembre 2019 à 16:27 +0200, Jean-Baptiste Onofré a écrit :
>> Thanks David,
>>
>> it makes sense, it gives me time to investigate and fix.
>>
>> Regards
>> JB
>>
>> On 04/09/2019 15:01, David Morávek wrote:
>> Hi, temporarily disabling the test
>> <https://github.com/apache/beam/pull/9470>, until BEAM-8025
>> <https://jira.apache.org/jira/browse/BEAM-8025> is resolved (marking it
>> as blocker for 2.16), so we can unblock ongoing pull requests.
>>
>> Best,
>> D.
>>
>> On Tue, Sep 3, 2019 at 3:57 PM Jean-Baptiste Onofré > <mailto:j...@nanthrax.net>
>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
>>
>> Hi Max,
>>
>> yup, I'm starting the investigation.
>>
>> I keep you posted.
>>
>> Regards
>> JB
>>
>> On 03/09/2019 15:34, Maximilian Michels wrote:
>> > The newest incarnation of this is here:
>> > https://jira.apache.org/jira/browse/BEAM-8025
>> >
>> > Would be good if you could take a look JB.
>> >
>> > Thanks,
>> > Max
>> >
>> > On 03.09.19 15:32, David Morávek wrote:
>> >> yes, that looks similar. example:
>> >>
>> >> https://github.com/apache/beam/pull/9464
>> >>
>> >> D.
>> >>
>> >> On 3 Sep 2019, at 15:18, Jean-Baptiste Onofré > <mailto:j...@nanthrax.net>
>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>
>> >> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net> 
>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>> wrote:
>> >>
>> >>> Thanks David,
>> >>>
>> >>> the build is running on my machine to see if I can reproduce
>> locally.
>> >>>
>> >>> It sounds like https://issues.apache.org/jira/browse/BEAM-7355
>> right ?
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On 03/09/2019 15:11, David Morávek wrote:
>> >>>> I’m running into these failures too
>> >>>>
>> >>>> D.
>> >>>>
>> >>>> Sent from my iPhone
>> >>>>
>> >>>>> On 3 Sep 2019, at 14:34, Jean-Baptiste Onofré > <mailto:j...@nanthrax.net>
>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>
>> >>>>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net> 
>> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Let me take a look. Do you always have this issue on Jenkins or
>> >>>>> randomly ?
>> >>>>>
>> >>>>> Regards
>> >>>>> JB
>> >>>>>
>> >>>>>> On 03/09/2019 14:19, Alex Van Boxel wrote:
>> >>>>>> Hi, is it only me that are bumping on the flaky Cassandra on
>> >>>>>> Jenkins? I
>> >>>>>> like to get my PR approved but I can't get past the Cassandra
>> >>>>>> error...
>>     >>>>>>
>> >>>>>> * org.apache.beam.sdk.io
>> <http://org.apache.beam.sdk.io>.cassandra.CassandraIOTest.classMethod
>> >>>>>>
>>   
>> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1300/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> _/
>> >>>>>> _/ Alex Van Boxel
>> >>>>>
>> >>>>> -- 
>> >>>>> Jean-Baptiste Onofré
>> >>>>> jbono...@apache.org <mailto:jbono...@apache.org> 
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org> 
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>>
>> >>>>> http://blog.nanthrax.net
>> >>>>> Talend - http://www.talend.com
>> >>>
>> >>> -- 
>> >>> Jean-Baptiste Onofré
>> >>> jbono...@apache.org <mailto:jbono...@apache.org> 
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org> 
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>>
>> >>> http://blog.nanthrax.net
>> >>> Talend - http://www.talend.com
>>
>> -- 
>> Jean-Baptiste Onofré
>> jbono...@apache.org <mailto:jbono...@apache.org> 
>> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: beam-site issues with Jenkins and MergeBot

2017-08-09 Thread Jean-Baptiste Onofré

Beam site is no more on git-wip-us, but it moved to gitbox afair.

Regards
JB

On 08/09/2017 10:08 PM, Eugene Kirpichov wrote:

Hello,

I've been trying to merge a PR https://github.com/apache/beam-site/pull/278
and ran into the following issues:

1) When I do "git fetch --all" on beam-site, I get an error "fatal:
repository 'https://git-wip-us.apache.org/repos/asf/beam-site.git/' not
found". Has the git address of the apache repo changed? Is it no longer
valid because we have MergeBot?

2) Precommit tests are failing nearly 100% of the time.
If you look at build history on
https://builds.apache.org/job/beam_PreCommit_Website_Test/ - 9 out of 10
last builds failed.
Failures I saw:

7 times:
+ gpg --keyserver hkp://keys.gnupg.net --recv-keys
409B6B1796C275462A1703113804BB82D39DC0E3
gpg: requesting key D39DC0E3 from hkp server keys.gnupg.net
?: keys.gnupg.net: Cannot assign requested address

2 times:
- ./content/subdir/contribute/testing/index.html
   *  External link https://builds.apache.org/view/Beam/ failed: 404 No error

The second failure seems legit - https://builds.apache.org/view/Beam/ is
actually 404 right now (I'll send a separate email about htis)

The gnupg failure is not legit - I'm able to run the same command myself
with no issues.

3) Suppose because of this, I'm not able to merge my PR with "@asfgit
merge" command - I suppose it requires a successful test run. Would be nice
if it posted a comment saying why it refuses to merge.



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-10 Thread Jean-Baptiste Onofré

Gently reminder on this thread.

Thanks !
Regards
JB

On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:

Hi everyone,

Please review and vote on the release candidate #3 for the version 2.1.0, as 
follows:


[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2], 
which is signed with the key with fingerprint C8282E76 [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.1.0-RC3" [5],
* website pull request listing the release and publishing the API reference 
manual [6].
* Python artifacts are deployed along with the source release to the 
dist.apache.org [2].


The vote will be open for at least 72 hours. It is adopted by majority approval, 
with at least 3 PMC affirmative votes.


Thanks,
JB

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12340528 


[2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1020/
[5] https://github.com/apache/beam/tree/v2.1.0-RC3
[6] https://github.com/apache/beam-site/pull/270


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


JB mostly offline for the next two weeks

2017-08-11 Thread Jean-Baptiste Onofré
Hi guys

FYI I'm in vacation tonight for the next two weeks. I will be online up to 
Sunday evening, and I will start a road trip next Monday.
I will have Internet connection on the evening but very limited during days.

Cheers
JB
⁣​

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-11 Thread Jean-Baptiste Onofré

Congrats !

Regards
JB

On 08/11/2017 07:40 PM, Davor Bonaci wrote:

Please join me and the rest of Beam PMC in welcoming the following
committers as our newest PMC members. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Ahmet Altay
Beyond significant work to drive the Python SDK to the master branch, Ahmet
has worked project-wide, driving releases, improving processes and testing,
and growing the community.

* Aviem Zur
Beyond significant work in the Spark runner, Aviem has worked to improve
how the project operates, leading discussions on inclusiveness and openness.

Congratulations to both! Welcome!

Davor



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [ANNOUNCEMENT] New committers, August 2017 edition!

2017-08-11 Thread Jean-Baptiste Onofré

Congrats and welcome !

Regards
JB

On 08/11/2017 07:40 PM, Davor Bonaci wrote:

Please join me and the rest of Beam PMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Reuven Lax
Reuven has been with the project since the very beginning, contributing
mostly to the core SDK and the GCP IO connectors. He accumulated 52 commits
(19,824 ++ / 12,039 --). Most recently, Reuven re-wrote several IO
connectors that significantly expanded their functionality. Additionally,
Reuven authored important new design documents relating to update and
snapshot functionality.

* Jingsong Lee
Jingsong has been contributing to Apache Beam since the beginning of the
year, particularly to the Flink runner. He has accumulated 34 commits
(11,214 ++ / 6,314 --) of deep, fundamental changes that significantly
improved the quality of the runner. Additionally, Jingsong has contributed
to the project in other ways too -- reviewing contributions, and
participating in discussions on the mailing list, design documents, and
JIRA issue tracker.

* Mingmin Xu
Mingmin started the SQL DSL effort, and has driven it to the point of
merging to the master branch. In this effort, he extended the project to
the significant new user community.

* Mingming (James) Xu
James joined the SQL DSL effort, contributing some of the trickier parts,
such as the Join functionality. Additionally, he's consistently shown
himself to be an insightful code reviewer, significantly impacting the
project’s code quality and ensuring the success of the new major component.

* Manu Zhang
Manu initiated and developed a runner for the Apache Gearpump (incubating)
engine, and has driven it to the point of merging to the master branch. In
this effort, he accumulated 65 commits (7,812 ++ / 4,882 --) and extended
the project to the new user community.

Congratulations to all five! Welcome!

Davor



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-12 Thread Jean-Baptiste Onofré

+1 (binding)

I do my own tests and casting my own vote ;)

Regards
JB

On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:

Hi everyone,

Please review and vote on the release candidate #3 for the version 2.1.0, as 
follows:


[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2], 
which is signed with the key with fingerprint C8282E76 [3],

* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.1.0-RC3" [5],
* website pull request listing the release and publishing the API reference 
manual [6].
* Python artifacts are deployed along with the source release to the 
dist.apache.org [2].


The vote will be open for at least 72 hours. It is adopted by majority approval, 
with at least 3 PMC affirmative votes.


Thanks,
JB

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12340528 


[2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1020/
[5] https://github.com/apache/beam/tree/v2.1.0-RC3
[6] https://github.com/apache/beam-site/pull/270


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Policy for stale PRs

2017-08-15 Thread Jean-Baptiste Onofré
If we consider the author, it makes sense.

Regards
JB

On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
>The proposal makes sense.
>
>If the author of PR doesn't respond for 90 days, the PR is likely out
>of
>sync with current repo.
>
>Cheers
>
>On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay 
>wrote:
>
>> Hi all,
>>
>> Do we have an existing policy for handling stale PRs? If not could we
>come
>> up with one. We are getting close to 100 open PRs. Some of the open
>PRs
>> have not been touched for a while, and if we exclude the pings the
>number
>> will be higher.
>>
>> For example, we could close PRs that have not been updated by the
>original
>> author for 90 days even after multiple attempts to reach them (e.g.
>[1],
>> [2] are such PRs.)
>>
>> What do you think?
>>
>> Thank you,
>> Ahmet
>>
>> [1] https://github.com/apache/beam/pull/1464
>> [2] https://github.com/apache/beam/pull/2949
>>


Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread Jean-Baptiste Onofré
Welcome !

Regards
JB

On Aug 16, 2017, 08:54, at 08:54, "Ismaël Mejía"  wrote:
>Hello and welcome Griselda, Umang, Justin
>
>Apart of the links provided by Ahmet you might read Beam-related
>material on the website (See Documentation > Programming Guide and
>Documentation > Additional Resources among others).
>
>But probably as important as improving your Beam related knowledge is
>to understand the principles of an open source project and more
>concretely the way the Apache projects work (in case this is your
>first Apache project), concepts like How projects are structured
>(PMCs, committers, votes, etc) and the most important ones Community
>over Code and Meritocracy.
>
>https://www.apache.org/foundation/how-it-works.html
>https://blogs.apache.org/foundation/entry/asf_15_community_over_code
>
>Welcome all and don't hesitate to ask questions, we are all here to
>make this project better so for sure we can help.
>Ismaël
>
>
>On Tue, Aug 15, 2017 at 11:04 PM, Justin T  wrote:
>> Hello Beam community,
>>
>> I am also a new member, and I feel a little better knowing that there
>> others on the same boat:)
>>
>> My name is Justin and I work as a full stack engineer for Neustar, a
>> marketing analytics company in San Diego. Over the past few weeks I
>have
>> been getting more familiar with Beam via documentation, papers,
>videos, and
>> the old email archives and I am very excited to start making
>contributions.
>> Thank you Altay for the useful links!
>>
>> -Justin Tumale
>>
>> On Tue, Aug 15, 2017 at 11:19 AM, Ahmet Altay
>
>> wrote:
>>
>>> Welcome both of you!
>>>
>>> Some helpful starting points:
>>> - Contribution guide [1]
>>> - Unassigned starter issues in JIRA [2]
>>>
>>> Ahmet
>>>
>>> [1] https://beam.apache.org/contribute/contribution-guide/
>>> [2]
>>> https://issues.apache.org/jira/browse/BEAM-2632?jql=
>>>
>project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%
>>>
>20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20starter%20AND%
>>> 20assignee%20in%20(EMPTY)%20ORDER%20BY%20created%20DESC%
>>> 2C%20priority%20DESC
>>>
>>> On Tue, Aug 15, 2017 at 11:13 AM, Umang Sharma 
>>> wrote:
>>>
>>> > Hi Gris,
>>> > Nice to meet you.
>>> >
>>> > I'd like to take this opportunity to introduce me to you and
>everyone
>>> else
>>> > in  the dev team.
>>> >
>>> > I’m m Umang Sharma. I'm an associate in Data Science and
>Applications at
>>> > Accenture Digital.
>>> >
>>> >
>>> > I write in python, Java and a number of other languages.
>>> > I'd love to contribute to Beam. It'd br great if someone guides me
>to get
>>> > started with contributing :)
>>> >
>>> > Among the other things i like are polo golf, giving talks and
>talking
>>> about
>>> > mu work .
>>> >
>>> > Thanks,
>>> > Umang
>>> >
>>> >
>>> > On Aug 15, 2017 22:40, "Griselda Cuevas" 
>>> wrote:
>>> >
>>> > Hi Beam community,
>>> >
>>> > I’m Griselda (Gris) Cuevas and I’m very excited to join the
>community,
>>> I’m
>>> > looking forward to learning awesome things from you and to getting
>the
>>> > chance to collaborate on great initiatives.
>>> >
>>> > I’m currently working at Google and I’m studying a masters in
>operations
>>> > research and data science at UC Berkeley. I’m interested in
>Natural
>>> > Language Processing, Information Retrieval and Online Communities.
>Some
>>> > other fun topics I love are juggling, camping and -just getting
>into it-
>>> >  listening to podcasts, so if you ever want to discuss and talk
>about any
>>> > of these topics, here I am!
>>> >
>>> > Another reason why I’m here is because I want to help this project
>grow
>>> and
>>> > thrive. This means that you’ll see me contributing to the project,
>>> reaching
>>> > out to ask questions as I get familiar with our community, and I
>also
>>> > helping evangelize Apache Beam by organizing meetups, hangouts,
>etc.
>>> >
>>> > I say bye for now, I’ll see you around,
>>> >
>>> > Cheers,
>>> >
>>> > G
>>> >
>>>


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Jean-Baptiste Onofré
Hi

Thanks. I will send the result e-mail, promote the artifacts on Central and 
dist.apache.org. Then I will prepare the announcement (website and mailing 
lists).

Regards
JB

On Aug 16, 2017, 17:20, at 17:20, Eugene Kirpichov 
 wrote:
>Thanks Luke! With your vote, we have 3 PMC affirmative votes.
>JB, what are the next steps to finalize the release?
>
>On Wed, Aug 16, 2017 at 8:50 AM Lukasz Cwik 
>wrote:
>
>> Back from vacation.
>>
>> +1 binding
>>
>> BEAM-2671 has been marked for 2.2.0 release.
>>
>>
>>
>> On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant 
>> wrote:
>>
>> > Hi,
>> >
>> > Spark runner was tested with word count example and a more complex
>> session
>> > based application on a yarn cluster.
>> > Both application run successfully so we can say that spark runner
>passed
>> > the sanity tests needed.
>> >
>> > Still there is an open ticket
>> > https://issues.apache.org/jira/browse/BEAM-2671 which Stas is
>working on
>> > and its implications should be taken into consideration regarding
>the
>> > release.
>> >
>> > Regards
>> > Kobi
>> >
>> > 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov
>> >:
>> >
>> > > Hey all,
>> > >
>> > > Seems like we're missing one more affirmative vote from a PMC
>member
>> (so
>> > > far we have JB and Ahmet) to proceed with the release.
>> > >
>> > > On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay
>
>> > > wrote:
>> > >
>> > > > On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía
>
>> > wrote:
>> > > >
>> > > > > +1 (non-binding)
>> > > > >
>> > > > > - Validated signatures OK
>> > > > > - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle
>JDK 8
>> > with
>> > > > > the docker development images (WIP), both OK
>> > > > > - Run WordCount on local Flink and Spark runners OK
>> > > > >
>> > > > > Everything looks nice, only one minor thing (not blocking at
>all).
>> > The
>> > > > > proto generated files for python are not cleaned correctly
>and this
>> > > > > causes the validation to complain because the maven rat
>plugin does
>> > > > > not find the apache headers on the files  (this happens if
>you
>> > execute
>> > > > > mvn clean verify -Prelease immediately after the validation).
>> > > > >
>> > > >
>> > > > Ismaël, could you create a JIRA issue for this (to be fixed at
>a
>> future
>> > > > release)?
>> > > >
>> > > >
>> > > > >
>> > > > > On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
>> > j...@nanthrax.net
>> > > >
>> > > > > wrote:
>> > > > > > +1 (binding)
>> > > > > >
>> > > > > > I do my own tests and casting my own vote ;)
>> > > > > >
>> > > > > > Regards
>> > > > > > JB
>> > > > > >
>> > > > > > On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
>> > > > > >>
>> > > > > >> Hi everyone,
>> > > > > >>
>> > > > > >> Please review and vote on the release candidate #3 for the
>> version
>> > > > > 2.1.0,
>> > > > > >> as follows:
>> > > > > >>
>> > > > > >> [ ] +1, Approve the release
>> > > > > >> [ ] -1, Do not approve the release (please provide
>specific
>> > > comments)
>> > > > > >>
>> > > > > >>
>> > > > > >> The complete staging area is available for your review,
>which
>> > > > includes:
>> > > > > >> * JIRA release notes [1],
>> > > > > >> * the official Apache source release to be deployed to
>> > > > dist.apache.org
>> > > > > >> [2], which is signed with the key with fingerprint
>C8282E76 [3],
>> > > > > >> * all artifacts to be deployed to the Maven Central
>Repository
>> > [4],
>> > > > > >> * source code tag "v2.1.0-RC3" [5],
>> > > > > >> * website pull request listing the release and publishing
>the
>> API
>> > > > > >> reference manual [6].
>> > > > > >> * Python artifacts are deployed along with the source
>release to
>> > the
>> > > > > >> dist.apache.org [2].
>> > > > > >>
>> > > > > >> The vote will be open for at least 72 hours. It is adopted
>by
>> > > majority
>> > > > > >> approval, with at least 3 PMC affirmative votes.
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> JB
>> > > > > >>
>> > > > > >> [1]
>> > > > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> > > > > projectId=12319527&version=12340528
>> > > > > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
>> > > > > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > > > > >> [4] https://repository.apache.org/content/repositories/
>> > > > > orgapachebeam-1020/
>> > > > > >> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
>> > > > > >> [6] https://github.com/apache/beam-site/pull/270
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Jean-Baptiste Onofré
>> > > > > > jbono...@apache.org
>> > > > > > http://blog.nanthrax.net
>> > > > > > Talend - http://www.talend.com
>> > > > >
>> > > >
>> > >
>> >
>>


Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Jean-Baptiste Onofré
I will thanks !

Regards
JB

On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar 
 wrote:
>Hi everyone,
>
>I have a proposal to add a new built-in I/O source for VCF files:
>https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
>
>I'm planning to take on the implementation work myself, but wanted to
>get
>preliminary feedback about the proposed design as it requires making
>changes to the existing TextIO. I will file a JIRA FR as well.
>
>Please take a look at the doc and feel free to comment.
>
>Thanks,
>Asha


Re: Policy for stale PRs

2017-08-16 Thread Jean-Baptiste Onofré
IMHO the jira should stay open as it's different from the PR.

Regards
JB

On Aug 16, 2017, 20:16, at 20:16, Ted Yu  wrote:
>What should be done to the JIRA associated with the PR?
> Original message From: Ahmet Altay
> Date: 8/16/17  12:05 PM  (GMT-08:00) To:
>dev@beam.apache.org Subject: Re: Policy for stale PRs
>Sounds like we have consensus. Since this is a new policy, I would
>suggest
>picking the most flexible option for now (90 days) and we can tighten
>it in
>the future. To answer Kenn's question, I do not know, how other
>projects
>handle this. I did a basic search but could not find a good answer.
>
>What mechanism can we use to close PRs, assuming that author will be
>out of
>communication. We can push a commit with a "This closes #xyz #abc"
>message.
>Is there another way to do this?
>
>Ahmet
>
>On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
>
>> Makes sense to close after a long time of inactivity and no response,
>and
>> as Kenn mentioned they can always re-open.
>>
>> On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré
>
>> wrote:
>>
>> > If we consider the author, it makes sense.
>> >
>> > Regards
>> > JB
>> >
>> > On Aug 15, 2017, 01:29, at 01:29, Ted Yu 
>wrote:
>> > >The proposal makes sense.
>> > >
>> > >If the author of PR doesn't respond for 90 days, the PR is likely
>out
>> > >of
>> > >sync with current repo.
>> > >
>> > >Cheers
>> > >
>> > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay
>
>> > >wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> Do we have an existing policy for handling stale PRs? If not
>could we
>> > >come
>> > >> up with one. We are getting close to 100 open PRs. Some of the
>open
>> > >PRs
>> > >> have not been touched for a while, and if we exclude the pings
>the
>> > >number
>> > >> will be higher.
>> > >>
>> > >> For example, we could close PRs that have not been updated by
>the
>> > >original
>> > >> author for 90 days even after multiple attempts to reach them
>(e.g.
>> > >[1],
>> > >> [2] are such PRs.)
>> > >>
>> > >> What do you think?
>> > >>
>> > >> Thank you,
>> > >> Ahmet
>> > >>
>> > >> [1] https://github.com/apache/beam/pull/1464
>> > >> [2] https://github.com/apache/beam/pull/2949
>> > >>
>> >
>>


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-18 Thread Jean-Baptiste Onofré
Hi

I'm in vacation so I'm looking for a decent Internet connection to finalize the 
release.

I keep you posted.

Regards
JB

On Aug 18, 2017, 17:48, at 17:48, Eugene Kirpichov 
 wrote:
>Hi JB,
>
>Any updates on finalizing the release?
>
>Thanks.
>
>On Thu, Aug 17, 2017 at 5:42 AM Aljoscha Krettek 
>wrote:
>
>> (Belated) +1
>>
>>  * verified signatures
>>  * verified that Quickstart works with Flink Runner
>>
>> > On 16. Aug 2017, at 20:41, Robert Bradshaw
>
>> wrote:
>> >
>> > +1 binding
>> >
>> > (I've been on vacation as well.)
>> >
>> > On Wed, Aug 16, 2017 at 8:50 AM, Lukasz Cwik
>
>> wrote:
>> >> Back from vacation.
>> >>
>> >> +1 binding
>> >>
>> >> BEAM-2671 has been marked for 2.2.0 release.
>> >>
>> >>
>> >>
>> >> On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant
>
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Spark runner was tested with word count example and a more
>complex
>> session
>> >>> based application on a yarn cluster.
>> >>> Both application run successfully so we can say that spark runner
>> passed
>> >>> the sanity tests needed.
>> >>>
>> >>> Still there is an open ticket
>> >>> https://issues.apache.org/jira/browse/BEAM-2671 which Stas is
>working
>> on
>> >>> and its implications should be taken into consideration regarding
>the
>> >>> release.
>> >>>
>> >>> Regards
>> >>> Kobi
>> >>>
>> >>> 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov
>> :
>> >>>
>> >>>> Hey all,
>> >>>>
>> >>>> Seems like we're missing one more affirmative vote from a PMC
>member
>> (so
>> >>>> far we have JB and Ahmet) to proceed with the release.
>> >>>>
>> >>>> On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía
>
>> >>> wrote:
>> >>>>>
>> >>>>>> +1 (non-binding)
>> >>>>>>
>> >>>>>> - Validated signatures OK
>> >>>>>> - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle
>JDK 8
>> >>> with
>> >>>>>> the docker development images (WIP), both OK
>> >>>>>> - Run WordCount on local Flink and Spark runners OK
>> >>>>>>
>> >>>>>> Everything looks nice, only one minor thing (not blocking at
>all).
>> >>> The
>> >>>>>> proto generated files for python are not cleaned correctly and
>this
>> >>>>>> causes the validation to complain because the maven rat plugin
>does
>> >>>>>> not find the apache headers on the files  (this happens if you
>> >>> execute
>> >>>>>> mvn clean verify -Prelease immediately after the validation).
>> >>>>>>
>> >>>>>
>> >>>>> Ismaël, could you create a JIRA issue for this (to be fixed at
>a
>> future
>> >>>>> release)?
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>> On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
>> >>> j...@nanthrax.net
>> >>>>>
>> >>>>>> wrote:
>> >>>>>>> +1 (binding)
>> >>>>>>>
>> >>>>>>> I do my own tests and casting my own vote ;)
>> >>>>>>>
>> >>>>>>> Regards
>> >>>>>>> JB
>> >>>>>>>
>> >>>>>>> On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
>> >>>>>>>>
>> >>>>>>>> Hi everyone,
>> >>>>>>>>
>> >>>>>>>> Please review and vote on the release candidate #3 for the
>version
>> >>>>>> 2.1.0,
>> >>>>>>>> as follows:
>> >>>>>>>>
>> >>>>>>>> [ ] +1, Approve the release
>> >>>>>>>> [ ] -1, Do not approve the release (please provide specific
>> >>>> comments)
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> The complete staging area is available for your review,
>which
>> >>>>> includes:
>> >>>>>>>> * JIRA release notes [1],
>> >>>>>>>> * the official Apache source release to be deployed to
>> >>>>> dist.apache.org
>> >>>>>>>> [2], which is signed with the key with fingerprint C8282E76
>[3],
>> >>>>>>>> * all artifacts to be deployed to the Maven Central
>Repository
>> >>> [4],
>> >>>>>>>> * source code tag "v2.1.0-RC3" [5],
>> >>>>>>>> * website pull request listing the release and publishing
>the API
>> >>>>>>>> reference manual [6].
>> >>>>>>>> * Python artifacts are deployed along with the source
>release to
>> >>> the
>> >>>>>>>> dist.apache.org [2].
>> >>>>>>>>
>> >>>>>>>> The vote will be open for at least 72 hours. It is adopted
>by
>> >>>> majority
>> >>>>>>>> approval, with at least 3 PMC affirmative votes.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> JB
>> >>>>>>>>
>> >>>>>>>> [1]
>> >>>>>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> >>>>>> projectId=12319527&version=12340528
>> >>>>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
>> >>>>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> >>>>>>>> [4] https://repository.apache.org/content/repositories/
>> >>>>>> orgapachebeam-1020/
>> >>>>>>>> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
>> >>>>>>>> [6] https://github.com/apache/beam-site/pull/270
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Jean-Baptiste Onofré
>> >>>>>>> jbono...@apache.org
>> >>>>>>> http://blog.nanthrax.net
>> >>>>>>> Talend - http://www.talend.com
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>>
>>


Re: Style: how much testing for transform builder classes?

2017-08-18 Thread Jean-Baptiste Onofré
t;> > > > > If validate method is where all the validation happens, then
>we
>>> > should
>>> > > > able
>>> > > > > to eliminate some redundant checks and tests during
>construction
>>> time
>>> > > > like
>>> > > > > in *withOption* methods here
>>> > > > > <https://github.com/apache/beam/blob/master/sdks/java/io/
>>> > > > google-cloud-platform/src/main/java/org/apache/beam/sdk/
>>> > > > io/gcp/bigtable/BigtableIO.java#L199>
>>> > > > >  and here
>>> > > > > <https://github.com/apache/beam/blob/master/sdks/java/io/
>>> > > > google-cloud-platform/src/main/java/org/apache/beam/sdk/
>>> > > > io/gcp/datastore/DatastoreV1.java#L387>
>>> > > > > as
>>> > > > > these are also checked in the validate method.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > -Vikas
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On 14 March 2017 at 15:40, Eugene Kirpichov
>>> > > >> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > >> Thanks all. Looks like people are on board with the general
>>> > direction
>>> > > > >> though it remains to refine it to concrete guidelines to go
>into
>>> > style
>>> > > > >> guide.
>>> > > > >>
>>> > > > >> Ismaël, can you give more details about the situation you
>>> described
>>> > in
>>> > > > the
>>> > > > >> first paragraph? Is it perhaps that really a
>RunnableOnService
>>> test
>>> > > was
>>> > > > >> missing (and perhaps still is), rather than a builder test?
>>> > > > >>
>>> > > > >> Vikas, regarding trivial tests and user waiting for a
>>> work-around:
>>> > in
>>> > > > the
>>> > > > >> situation I described, they don't really need a workaround
>- they
>>> > > > specified
>>> > > > >> an invalid value and have been minorly inconvenienced
>because the
>>> > > error
>>> > > > >> they got about it was not very readable, so fixing their
>value
>>> took
>>> > > > them a
>>> > > > >> little longer than it could have, but they fixed it and
>their
>>> work
>>> > is
>>> > > > not
>>> > > > >> blocked. I think Robert's arguments about the cost of
>trivial
>>> tests
>>> > > > apply.
>>> > > > >>
>>> > > > >> I agree that the author should be at liberty to choose
>which
>>> > > validation
>>> > > > to
>>> > > > >> unit-test and which to skip as trivial, so documentation on
>this
>>> > topic
>>> > > > >> should be in the form of guidelines, high-quality example
>code
>>> (i.e.
>>> > > > clean
>>> > > > >> up the unit tests of IOs bundled with Beam SDK), and
>informal
>>> > > knowledge
>>> > > > in
>>> > > > >> the heads of readers of this thread, rather than hard
>rules.
>>> > > > >>
>>> > > > >> On Tue, Mar 14, 2017 at 8:07 AM Ismaël Mejía
>
>>> > > wrote:
>>> > > > >>
>>> > > > >> > +0.5
>>> > > > >> >
>>> > > > >> > I used to think that some of those tests were not worth,
>for
>>> > example
>>> > > > >> > testBuildRead and
>>> > > > >> > testBuildReadAlt. However the reality is that these tests
>>> allowed
>>> > me
>>> > > > to
>>> > > > >> > find bugs both during the development of HBaseIO and just
>>> > y

Re: Beam spark 2.x runner status

2017-08-21 Thread Jean-Baptiste Onofré
Hi

I did a new runner supporting spark 2.1.x. I changed code for that.

I'm still in vacation this week. I will send an update when back.

Regards
JB

On Aug 21, 2017, 09:01, at 09:01, Pei HE  wrote:
>Any updates for upgrading to spark 2.x?
>
>I tried to replace the dependency and found a compile error from
>implementing a scala trait:
>org.apache.beam.runners.spark.io.SourceRDD.SourcePartition is not
>abstract
>and does not override abstract method
>org$apache$spark$Partition$$super$equals(java.lang.Object) in
>org.apache.spark.Partition
>
>(The spark side change was introduced in
>https://github.com/apache/spark/pull/12157.)
>
>Does anyone have ideas about this compile error?
>
>
>On Wed, May 3, 2017 at 1:32 PM, Jean-Baptiste Onofré 
>wrote:
>
>> Hi Ted,
>>
>> My branch used Spark 2.1.0 and I just updated to 2.1.1.
>>
>> As discussed with Aviem, I should be able to create the pull request
>later
>> today.
>>
>> Regards
>> JB
>>
>>
>> On 05/03/2017 02:50 AM, Ted Yu wrote:
>>
>>> Spark 2.1.1 has been released.
>>>
>>> Consider using the new release in this work.
>>>
>>> Thanks
>>>
>>> On Wed, Mar 29, 2017 at 5:43 AM, Jean-Baptiste Onofré
>
>>> wrote:
>>>
>>> Cool for the PR merge, I will rebase my branch on it.
>>>>
>>>> Thanks !
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 03/29/2017 01:58 PM, Amit Sela wrote:
>>>>
>>>> @Ted definitely makes sense.
>>>>> @JB I'm merging https://github.com/apache/beam/pull/2354 soon so
>any
>>>>> deprecated Spark API issues should be resolved.
>>>>>
>>>>> On Wed, Mar 29, 2017 at 2:46 PM Ted Yu 
>wrote:
>>>>>
>>>>> This is what I did over HBASE-16179:
>>>>>
>>>>>>
>>>>>> -f.call((asJavaIterator(it), conn)).iterator()
>>>>>> +// the return type is different in spark 1.x & 2.x, we
>handle
>>>>>> both
>>>>>> cases
>>>>>> +f.call(asJavaIterator(it), conn) match {
>>>>>> +  // spark 1.x
>>>>>> +  case iterable: Iterable[R] => iterable.iterator()
>>>>>> +  // spark 2.x
>>>>>> +  case iterator: Iterator[R] => iterator
>>>>>> +}
>>>>>>)
>>>>>>
>>>>>> FYI
>>>>>>
>>>>>> On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela 
>>>>>> wrote:
>>>>>>
>>>>>> Just tried to replace dependencies and see what happens:
>>>>>>
>>>>>>>
>>>>>>> Most required changes are about the runner using deprecated
>Spark
>>>>>>> APIs,
>>>>>>>
>>>>>>> and
>>>>>>
>>>>>> after fixing them the only real issue is with the Java API for
>>>>>>> Pair/FlatMapFunction that changed return value to Iterator (in
>1.6 its
>>>>>>> Iterable).
>>>>>>>
>>>>>>> So I'm not sure that a profile that simply sets dependency on
>>>>>>> 1.6.3/2.1.0
>>>>>>> is feasible.
>>>>>>>
>>>>>>> On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant
>
>>>>>>> wrote:
>>>>>>>
>>>>>>> So, if everything is in place in Spark 2.X and we use provided
>>>>>>>
>>>>>>>>
>>>>>>>> dependencies
>>>>>>>
>>>>>>> for Spark in Beam.
>>>>>>>> Theoretically, you can run the same code in 2.X without any
>need for
>>>>>>>> a
>>>>>>>> branch?
>>>>>>>>
>>>>>>>> 2017-03-23 9:47 GMT+02:00 Amit Sela :
>>>>>>>>
>>>>>>>> If StreamingContext is valid and we don't have to use
>SparkSession,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>
>>>>>>>
>>>>>> Accumulators are valid as well and we don't need AccumulatorsV2,
>I
>>>>>>>
>>>>>>>>

[RESULT][VOTE] Release 2.1.0, release candidate #3

2017-08-21 Thread Jean-Baptiste Onofré
Hi

This vote passed with only +1.

I'm promoting the artifacts to central and update Jira.

As I'm in vacation can a committer deal with the tag and website or merge ?

Sorry for this very short e-mail. Thanks all for your vote.

Regards
JB


On Aug 18, 2017, 18:43, at 18:43, "Jean-Baptiste Onofré"  
wrote:
>Hi
>
>I'm in vacation so I'm looking for a decent Internet connection to
>finalize the release.
>
>I keep you posted.
>
>Regards
>JB
>
>On Aug 18, 2017, 17:48, at 17:48, Eugene Kirpichov
> wrote:
>>Hi JB,
>>
>>Any updates on finalizing the release?
>>
>>Thanks.
>>
>>On Thu, Aug 17, 2017 at 5:42 AM Aljoscha Krettek 
>>wrote:
>>
>>> (Belated) +1
>>>
>>>  * verified signatures
>>>  * verified that Quickstart works with Flink Runner
>>>
>>> > On 16. Aug 2017, at 20:41, Robert Bradshaw
>>
>>> wrote:
>>> >
>>> > +1 binding
>>> >
>>> > (I've been on vacation as well.)
>>> >
>>> > On Wed, Aug 16, 2017 at 8:50 AM, Lukasz Cwik
>>
>>> wrote:
>>> >> Back from vacation.
>>> >>
>>> >> +1 binding
>>> >>
>>> >> BEAM-2671 has been marked for 2.2.0 release.
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant
>>
>>> wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Spark runner was tested with word count example and a more
>>complex
>>> session
>>> >>> based application on a yarn cluster.
>>> >>> Both application run successfully so we can say that spark
>runner
>>> passed
>>> >>> the sanity tests needed.
>>> >>>
>>> >>> Still there is an open ticket
>>> >>> https://issues.apache.org/jira/browse/BEAM-2671 which Stas is
>>working
>>> on
>>> >>> and its implications should be taken into consideration
>regarding
>>the
>>> >>> release.
>>> >>>
>>> >>> Regards
>>> >>> Kobi
>>> >>>
>>> >>> 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov
>>> :
>>> >>>
>>> >>>> Hey all,
>>> >>>>
>>> >>>> Seems like we're missing one more affirmative vote from a PMC
>>member
>>> (so
>>> >>>> far we have JB and Ahmet) to proceed with the release.
>>> >>>>
>>> >>>> On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay
>>>> >
>>> >>>> wrote:
>>> >>>>
>>> >>>>> On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía
>>
>>> >>> wrote:
>>> >>>>>
>>> >>>>>> +1 (non-binding)
>>> >>>>>>
>>> >>>>>> - Validated signatures OK
>>> >>>>>> - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle
>>JDK 8
>>> >>> with
>>> >>>>>> the docker development images (WIP), both OK
>>> >>>>>> - Run WordCount on local Flink and Spark runners OK
>>> >>>>>>
>>> >>>>>> Everything looks nice, only one minor thing (not blocking at
>>all).
>>> >>> The
>>> >>>>>> proto generated files for python are not cleaned correctly
>and
>>this
>>> >>>>>> causes the validation to complain because the maven rat
>plugin
>>does
>>> >>>>>> not find the apache headers on the files  (this happens if
>you
>>> >>> execute
>>> >>>>>> mvn clean verify -Prelease immediately after the validation).
>>> >>>>>>
>>> >>>>>
>>> >>>>> Ismaël, could you create a JIRA issue for this (to be fixed at
>>a
>>> future
>>> >>>>> release)?
>>> >>>>>
>>> >>>>>
>>> >>>>>>
>>> >>>>>> On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
>>> >>> j...@nanthrax.net
>>> >>>>>
>>> >>>>>> wrote:
>>> >>>>>>> +1 (binding)
>>> >

Re: "Unable to find registrar for hdfs" on Flink cluster

2017-08-29 Thread Jean-Baptiste Onofré

Hi,

did you add the org.apache.beam:beam-sdks-java-io-hadoop-file-system dependency 
in your pom.xml ?


Regards
JB

On 08/29/2017 08:59 AM, P. Ramanjaneya Reddy wrote:

Hi All,

build jar file from the beam quickstart. while run the jar on Flinkcluster
got below error.?

anybody got this error?
Could you please help how to resolve this?

root1@master:~/NAI/Tools/flink-1.3.0$ *bin/flink run -c
org.apache.beam.examples.WordCount
/home/root1/NAI/Tools/word-count-beam/target/word-count-beam-bundled-0.1.jar
--runner=FlinkRunner
--filesToStage=/home/root1/NAI/Tools/word-count-beam/target/word-count-beam-bundled-0.1.jar
--inputFile=hdfs://master:9000/test/wordcount_input.txt
  --output=hdfs://master:9000/test/wordcount_output919*


This is the output I get:

Caused by: java.lang.IllegalStateException: Unable to find registrar for
hdfs
at
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:517)
at
org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:204)
at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:296)
at org.apache.beam.examples.WordCount.main(WordCount.java:182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
... 13 more


Thanks & Regards,
Ramanji.



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: kafka docs

2017-08-29 Thread Jean-Baptiste Onofré

Hi Joey,

you should take a look on the Javadoc: it's where you will find the most 
accurate doc.


https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/kafka/KafkaIO.html

Regards
JB

On 08/29/2017 10:28 AM, Joey Baruch wrote:

Hey all,

As a new user trying to use a kafkaIO source/sink i couldn't find any
documentation easily.
The documentation page <https://beam.apache.org/documentation/io/built-in/>,
(which you get to from headder -> doccumentaton -> pipeline i/o -> built in
i/o transforms) leads to the kafka class
<https://github.com/apache/beam/tree/master/sdks/java/io/kafka>, but there
is no docs there.

I will add a simple readme that points to the classe's javadocs.

regards
Joey Baruch



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: kafka docs

2017-08-29 Thread Jean-Baptiste Onofré

Agree to add the link to javadoc on the I/O list:

https://beam.apache.org/documentation/io/built-in/

Regards
JB

On 08/29/2017 10:28 AM, Joey Baruch wrote:

Hey all,

As a new user trying to use a kafkaIO source/sink i couldn't find any
documentation easily.
The documentation page <https://beam.apache.org/documentation/io/built-in/>,
(which you get to from headder -> doccumentaton -> pipeline i/o -> built in
i/o transforms) leads to the kafka class
<https://github.com/apache/beam/tree/master/sdks/java/io/kafka>, but there
is no docs there.

I will add a simple readme that points to the classe's javadocs.

regards
Joey Baruch



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: new guy

2017-08-29 Thread Jean-Baptiste Onofré

Welcome !

What's your apache id ?

Regards
JB

On 08/29/2017 02:57 PM, Joey Baruch wrote:

Hey everyone,

Apache Beam looks like a pretty exciting new project, and I'd love to
contribute to it.
I'm a relatively fresh developer, but i'm looking learn by doing.

Would appreciate to be added as a contributor on jira.

Thanks!
Joey



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-08-30 Thread Jean-Baptiste Onofré

Hi Kenn,

thanks !

I think we can target 2.2.0 release for October. Thoughts ?

I'm also volunteer to manage this next release.

Regards
JB

On 08/31/2017 05:57 AM, Kenneth Knowles wrote:

I went ahead and set up https://s.apache.org/beam-2.2.0-burndown

On Wed, Aug 30, 2017 at 2:27 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


RedisIO in 2.2.0 is very unlikely. There's still a lot of review remaining
last time I checked on the PR.

On Wed, Aug 30, 2017 at 2:24 PM Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:


Any chance to get the RedisIO in this release?
[BEAM-1017] Add RedisIO #1687

Its not my PR but ll be happy to assist if there is anything I can do to
help.

On 30 Aug 2017 22:46, "Daniel Ribeiro" 
wrote:


It would be great to get a bump on pubsub
<https://github.com/apache/beam/blob/master/pom.xml#L145> dependency.

It

is
currently very outdated (v1-rev10-1.22.0, which was released over a

year

ago
<http://repo1.maven.org/maven2/com/google/apis/google-
api-services-pubsub/v1-rev10-1.22.0/>
).


On Wed, Aug 30, 2017 at 1:27 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Thanks Ismael. I've marked these two issues for fix in 2.2.0.

Definitely

agree that at least the first one must be fixed.

Here's the current burndown list
https://issues.apache.org/jira/projects/BEAM/versions/12341044 - we

should

clean it up.

On Wed, Aug 30, 2017 at 1:20 PM Ismaël Mejía 

wrote:



The current master has accumulated a good amount of nice features
since 2.1.0 so a new release is welcomed. I have two JIRAs/PR that

I

think are important to check/solve before the cut:

BEAM-2516 (this is a regression on the performance of Direct runner

on

Java). We had never really defined if a performance regression is
critical to be a blocker. I executed WordCount with the

kinglear.txt

(170KB) file in version 2.1.0 vs the current 2.2.0-SNAPSHOT and I
found that the execution time passed from 5s to 126s. So maybe we

need

to review this one before the release. I can understand if others
consider this a minor issue because the Direct runner is not

supposed

to be used for production, but this performance regression can

cause

a

bad impression for a casual user starting with Beam.

BEAM-2790 (fix reading from Amazon S3 via HadoopFileSystem). I

think

this one is a nice to have. I am not sure that I can tackle it for

the

wednesday cut. I’m OOO until the beginning of next week, but maybe
someone else can take a look. In the worst case this is not a

release

blocker but definitely a really nice fix to include.


On Wed, Aug 30, 2017 at 8:49 PM, Eugene Kirpichov
 wrote:

I'd like to get the following PRs into 2.2.0:

#3765 <https://github.com/apache/beam/pull/3765> [BEAM-2753
<https://issues.apache.org/jira/browse/BEAM-2753>] Fixes

translation

of

WriteFiles side inputs (important bugfix for DynamicDestinations

in

files)

#3725 <https://github.com/apache/beam/pull/3725> [BEAM-2827
<https://issues.apache.org/jira/browse/BEAM-2827>] Introduces
AvroIO.watchForNewFiles (parity for AvroIO with TextIO in a few

important

features)
#3759 <https://github.com/apache/beam/pull/3759> [BEAM-2828
<https://issues.apache.org/jira/browse/BEAM-2828>] Moves Match

into

FileIO.match()/matchAll() (to prevent releasing current
Match.filepatterns() into 2.2.0 and then having to keep it under

that

name)


On Wed, Aug 30, 2017 at 11:31 AM Mingmin Xu 

wrote:



Glad to see that 2.2.0 is coming. Can we include SQL feature in

next

release? We're in the final stage and expect to merge back to

master

this

week.

On Wed, Aug 30, 2017 at 11:27 AM, Reuven Lax




wrote:


Now that Beam 2.1.0 has finally completed, I think we should

cut

Beam

2.2.0

soon. I volunteer to coordinate this release.

Are there any pending pull requests that people think should

be

merged

before we cut 2.2.0? If so, please let me know soon, as I

would

like

to

cut

by Wednesday of next week.

Thanks,

Reuven





--

Mingmin















--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-08-30 Thread Jean-Baptiste Onofré

Hi Reuven,

sorry I missed the initial message in the thread.

Thanks to be volunteer for this release !

For the target deadline, I think October make sense. We have some new feature in 
progress (M/R runner, Spark 2.x runner,  etc). We can finalize it during 
September and so, October is a good timing for 2.2.0 release IMHO.


Regards
JB

On 08/30/2017 08:27 PM, Reuven Lax wrote:

Now that Beam 2.1.0 has finally completed, I think we should cut Beam 2.2.0
soon. I volunteer to coordinate this release.

Are there any pending pull requests that people think should be merged
before we cut 2.2.0? If so, please let me know soon, as I would like to cut
by Wednesday of next week.

Thanks,

Reuven



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-08-30 Thread Jean-Baptiste Onofré

With a 2.2.0 in October, I think we can try to move forward on RedisIO.

I'm now back from vacation and I will resume the work on this IO.

Regards
JB

On 08/30/2017 11:27 PM, Eugene Kirpichov wrote:

RedisIO in 2.2.0 is very unlikely. There's still a lot of review remaining
last time I checked on the PR.

On Wed, Aug 30, 2017 at 2:24 PM Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:


Any chance to get the RedisIO in this release?
[BEAM-1017] Add RedisIO #1687

Its not my PR but ll be happy to assist if there is anything I can do to
help.

On 30 Aug 2017 22:46, "Daniel Ribeiro" 
wrote:


It would be great to get a bump on pubsub
<https://github.com/apache/beam/blob/master/pom.xml#L145> dependency. It
is
currently very outdated (v1-rev10-1.22.0, which was released over a year
ago
<http://repo1.maven.org/maven2/com/google/apis/google-
api-services-pubsub/v1-rev10-1.22.0/>
).


On Wed, Aug 30, 2017 at 1:27 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Thanks Ismael. I've marked these two issues for fix in 2.2.0.

Definitely

agree that at least the first one must be fixed.

Here's the current burndown list
https://issues.apache.org/jira/projects/BEAM/versions/12341044 - we

should

clean it up.

On Wed, Aug 30, 2017 at 1:20 PM Ismaël Mejía 

wrote:



The current master has accumulated a good amount of nice features
since 2.1.0 so a new release is welcomed. I have two JIRAs/PR that I
think are important to check/solve before the cut:

BEAM-2516 (this is a regression on the performance of Direct runner

on

Java). We had never really defined if a performance regression is
critical to be a blocker. I executed WordCount with the kinglear.txt
(170KB) file in version 2.1.0 vs the current 2.2.0-SNAPSHOT and I
found that the execution time passed from 5s to 126s. So maybe we

need

to review this one before the release. I can understand if others
consider this a minor issue because the Direct runner is not supposed
to be used for production, but this performance regression can cause

a

bad impression for a casual user starting with Beam.

BEAM-2790 (fix reading from Amazon S3 via HadoopFileSystem). I think
this one is a nice to have. I am not sure that I can tackle it for

the

wednesday cut. I’m OOO until the beginning of next week, but maybe
someone else can take a look. In the worst case this is not a release
blocker but definitely a really nice fix to include.


On Wed, Aug 30, 2017 at 8:49 PM, Eugene Kirpichov
 wrote:

I'd like to get the following PRs into 2.2.0:

#3765 <https://github.com/apache/beam/pull/3765> [BEAM-2753
<https://issues.apache.org/jira/browse/BEAM-2753>] Fixes

translation

of

WriteFiles side inputs (important bugfix for DynamicDestinations in

files)

#3725 <https://github.com/apache/beam/pull/3725> [BEAM-2827
<https://issues.apache.org/jira/browse/BEAM-2827>] Introduces
AvroIO.watchForNewFiles (parity for AvroIO with TextIO in a few

important

features)
#3759 <https://github.com/apache/beam/pull/3759> [BEAM-2828
<https://issues.apache.org/jira/browse/BEAM-2828>] Moves Match

into

FileIO.match()/matchAll() (to prevent releasing current
Match.filepatterns() into 2.2.0 and then having to keep it under

that

name)


On Wed, Aug 30, 2017 at 11:31 AM Mingmin Xu 

wrote:



Glad to see that 2.2.0 is coming. Can we include SQL feature in

next

release? We're in the final stage and expect to merge back to

master

this

week.

On Wed, Aug 30, 2017 at 11:27 AM, Reuven Lax




wrote:


Now that Beam 2.1.0 has finally completed, I think we should cut

Beam

2.2.0

soon. I volunteer to coordinate this release.

Are there any pending pull requests that people think should be

merged

before we cut 2.2.0? If so, please let me know soon, as I would

like

to

cut

by Wednesday of next week.

Thanks,

Reuven





--

Mingmin













--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-08-30 Thread Jean-Baptiste Onofré
As we released 2.1.0 couple of weeks ago, it could sound weird to the users to 
do a 2.2.0 so fast. If we have a blocking issue, we can do a 2.1.1 If it's new 
features, why not having a release pace in October (2.2.0) ?


Thoughts ?

Regards
JB

On 08/31/2017 08:27 AM, Eugene Kirpichov wrote:

I'd suggest to do 2.2.0 as quickly as possible, and target 2.3.0 for
October. I don't see a reason to delay 2.2.0 until October: there's a huge
amount of features worth releasing between when 2.1.0 was cut and the
current HEAD.

On Wed, Aug 30, 2017 at 10:18 PM Jean-Baptiste Onofré 
wrote:


With a 2.2.0 in October, I think we can try to move forward on RedisIO.

I'm now back from vacation and I will resume the work on this IO.

Regards
JB

On 08/30/2017 11:27 PM, Eugene Kirpichov wrote:

RedisIO in 2.2.0 is very unlikely. There's still a lot of review

remaining

last time I checked on the PR.

On Wed, Aug 30, 2017 at 2:24 PM Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:


Any chance to get the RedisIO in this release?
[BEAM-1017] Add RedisIO #1687

Its not my PR but ll be happy to assist if there is anything I can do to
help.

On 30 Aug 2017 22:46, "Daniel Ribeiro" 
wrote:


It would be great to get a bump on pubsub
<https://github.com/apache/beam/blob/master/pom.xml#L145> dependency.

It

is
currently very outdated (v1-rev10-1.22.0, which was released over a

year

ago
<http://repo1.maven.org/maven2/com/google/apis/google-
api-services-pubsub/v1-rev10-1.22.0/>
).


On Wed, Aug 30, 2017 at 1:27 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Thanks Ismael. I've marked these two issues for fix in 2.2.0.

Definitely

agree that at least the first one must be fixed.

Here's the current burndown list
https://issues.apache.org/jira/projects/BEAM/versions/12341044 - we

should

clean it up.

On Wed, Aug 30, 2017 at 1:20 PM Ismaël Mejía 

wrote:



The current master has accumulated a good amount of nice features
since 2.1.0 so a new release is welcomed. I have two JIRAs/PR that I
think are important to check/solve before the cut:

BEAM-2516 (this is a regression on the performance of Direct runner

on

Java). We had never really defined if a performance regression is
critical to be a blocker. I executed WordCount with the kinglear.txt
(170KB) file in version 2.1.0 vs the current 2.2.0-SNAPSHOT and I
found that the execution time passed from 5s to 126s. So maybe we

need

to review this one before the release. I can understand if others
consider this a minor issue because the Direct runner is not supposed
to be used for production, but this performance regression can cause

a

bad impression for a casual user starting with Beam.

BEAM-2790 (fix reading from Amazon S3 via HadoopFileSystem). I think
this one is a nice to have. I am not sure that I can tackle it for

the

wednesday cut. I’m OOO until the beginning of next week, but maybe
someone else can take a look. In the worst case this is not a release
blocker but definitely a really nice fix to include.


On Wed, Aug 30, 2017 at 8:49 PM, Eugene Kirpichov
 wrote:

I'd like to get the following PRs into 2.2.0:

#3765 <https://github.com/apache/beam/pull/3765> [BEAM-2753
<https://issues.apache.org/jira/browse/BEAM-2753>] Fixes

translation

of

WriteFiles side inputs (important bugfix for DynamicDestinations in

files)

#3725 <https://github.com/apache/beam/pull/3725> [BEAM-2827
<https://issues.apache.org/jira/browse/BEAM-2827>] Introduces
AvroIO.watchForNewFiles (parity for AvroIO with TextIO in a few

important

features)
#3759 <https://github.com/apache/beam/pull/3759> [BEAM-2828
<https://issues.apache.org/jira/browse/BEAM-2828>] Moves Match

into

FileIO.match()/matchAll() (to prevent releasing current
Match.filepatterns() into 2.2.0 and then having to keep it under

that

name)


On Wed, Aug 30, 2017 at 11:31 AM Mingmin Xu 

wrote:



Glad to see that 2.2.0 is coming. Can we include SQL feature in

next

release? We're in the final stage and expect to merge back to

master

this

week.

On Wed, Aug 30, 2017 at 11:27 AM, Reuven Lax




wrote:


Now that Beam 2.1.0 has finally completed, I think we should cut

Beam

2.2.0

soon. I volunteer to coordinate this release.

Are there any pending pull requests that people think should be

merged

before we cut 2.2.0? If so, please let me know soon, as I would

like

to

cut

by Wednesday of next week.

Thanks,

Reuven





--

Mingmin













--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Capability Matrix revamp

2017-08-31 Thread Jean-Baptiste Onofré
 inputs"
3. Add rows for non-model things, like portability framework

support,

metrics backends, etc
4. Drop rows that are not informative, like Composite transforms, or

not

designed
5. Reorganize the windowing section to be just support for merging /
non-merging windowing.
6. Switch to a more distinct color scheme than the solid vs faded

colors

currently used.
7. Find a web design to get short descriptions into the foreground

to

make

it easier to grok.

These are just a few thoughts, and not necessarily compatible with

each

other. What do you think?

Kenn


--
Thanks,

Jesse








--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-08-31 Thread Jean-Baptiste Onofré
Fair enough.

That's fine for me.

Regards
JB

On Aug 31, 2017, 19:03, at 19:03, Steve Niemitz  wrote:
>I'll chime in as a user who would love to see 2.2.0 sooner than later,
>specifically for the file IO Eugene mentioned.  We're using the AvroIO
>enhancements extensively, but I am hesitant to run from HEAD in master
>in
>production.
>
>On Thu, Aug 31, 2017 at 12:42 PM, Eugene Kirpichov <
>kirpic...@google.com.invalid> wrote:
>
>> There are a lot of users including very large production customers
>who have
>> been asking specifically for the features that are in 2.2.0 (most of
>them
>> accumulated while 2.1.0 was being iterated on) - mostly I'm referring
>to
>> the vastly improved file IO - and they have been hesitant to use Beam
>at
>> HEAD in production. I think the slight unusualness of having a
>release
>> published soon after the previous release is a small price to pay for
>> helping those users :)
>>
>> On Wed, Aug 30, 2017, 11:30 PM Jean-Baptiste Onofré 
>> wrote:
>>
>> > As we released 2.1.0 couple of weeks ago, it could sound weird to
>the
>> > users to
>> > do a 2.2.0 so fast. If we have a blocking issue, we can do a 2.1.1
>If
>> it's
>> > new
>> > features, why not having a release pace in October (2.2.0) ?
>> >
>> > Thoughts ?
>> >
>> > Regards
>> > JB
>> >
>> > On 08/31/2017 08:27 AM, Eugene Kirpichov wrote:
>> > > I'd suggest to do 2.2.0 as quickly as possible, and target 2.3.0
>for
>> > > October. I don't see a reason to delay 2.2.0 until October:
>there's a
>> > huge
>> > > amount of features worth releasing between when 2.1.0 was cut and
>the
>> > > current HEAD.
>> > >
>> > > On Wed, Aug 30, 2017 at 10:18 PM Jean-Baptiste Onofré
>> >
>> > > wrote:
>> > >
>> > >> With a 2.2.0 in October, I think we can try to move forward on
>> RedisIO.
>> > >>
>> > >> I'm now back from vacation and I will resume the work on this
>IO.
>> > >>
>> > >> Regards
>> > >> JB
>> > >>
>> > >> On 08/30/2017 11:27 PM, Eugene Kirpichov wrote:
>> > >>> RedisIO in 2.2.0 is very unlikely. There's still a lot of
>review
>> > >> remaining
>> > >>> last time I checked on the PR.
>> > >>>
>> > >>> On Wed, Aug 30, 2017 at 2:24 PM Vilhelm von Ehrenheim <
>> > >>> vonehrenh...@gmail.com> wrote:
>> > >>>
>> > >>>> Any chance to get the RedisIO in this release?
>> > >>>> [BEAM-1017] Add RedisIO #1687
>> > >>>>
>> > >>>> Its not my PR but ll be happy to assist if there is anything I
>can
>> do
>> > to
>> > >>>> help.
>> > >>>>
>> > >>>> On 30 Aug 2017 22:46, "Daniel Ribeiro"
>> > > >
>> > >>>> wrote:
>> > >>>>
>> > >>>>> It would be great to get a bump on pubsub
>> > >>>>> <https://github.com/apache/beam/blob/master/pom.xml#L145>
>> > dependency.
>> > >> It
>> > >>>>> is
>> > >>>>> currently very outdated (v1-rev10-1.22.0, which was released
>over a
>> > >> year
>> > >>>>> ago
>> > >>>>> <http://repo1.maven.org/maven2/com/google/apis/google-
>> > >>>>> api-services-pubsub/v1-rev10-1.22.0/>
>> > >>>>> ).
>> > >>>>>
>> > >>>>>
>> > >>>>> On Wed, Aug 30, 2017 at 1:27 PM, Eugene Kirpichov <
>> > >>>>> kirpic...@google.com.invalid> wrote:
>> > >>>>>
>> > >>>>>> Thanks Ismael. I've marked these two issues for fix in
>2.2.0.
>> > >>>> Definitely
>> > >>>>>> agree that at least the first one must be fixed.
>> > >>>>>>
>> > >>>>>> Here's the current burndown list
>> > >>>>>>
>https://issues.apache.org/jira/projects/BEAM/versions/12341044 -
>> we
>> > >>>>> should
>> > >>>>>> clean it up.
>> > >>>>>>
>> > >&

Re: MapReduce Runner needs contributors

2017-09-05 Thread Jean-Baptiste Onofré

Hi Pei,

as already discussed (and as I started the contribution ;)), I will help you on 
this runner (I'm working on the Spark 2.x runner in the mean time).


Thanks !

Regards
JB

On 09/06/2017 07:24 AM, Pei HE wrote:

Hi all,
I am in the process of merge MapReduce Runner to its feature branch,
https://github.com/apache/beam/pull/3705

I would like to call for contributors' help for making it more mature.
Here are areas that need help:
1. Feature completion
Currently, there are few ValidatesRunners tests excluded, such as
gauge/distribution metrics, stateful/splittable pardo, user timers.
2. Performance improvement
For examples, https://issues.apache.org/jira/browse/BEAM-2835
3. Production readiness
Try run it in Hadoop cluster

Thanks and looking forward to the collaboration on MapReduce Runner.
--
Pei



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] FileIO.write: a modular replacement for FileBasedSink/WriteFiles

2017-09-06 Thread Jean-Baptiste Onofré

Fantastic.

Big +1 for this.

Regards
JB

On 09/07/2017 03:44 AM, Eugene Kirpichov wrote:

Hi,

Please take a look at the following proposal.

I believe, together with the (already available) FileIO.match() and
FileIO.readMatches() this proposal will empower Beam users to address all
use cases of file-based IO I'm aware of - which makes me quite excited.

http://s.apache.org/fileio-write

*We propose a new API for writing files in Beam: FileIO.write(). It is more
modular and cleaner to code against than FileBasedSink, and aims to
completely replace it.*

*FileIO.write() lets an IO author implement only logic and configuration
specific to a particular file format (e.g. Avro) and automatically get all
format-agnostic features, such as sharding, cleanup, windowed writes,
DynamicDestinations, compression, returning the successfully written
filenames, etc.*

TL;DR:

FileIO.write(FileSink { open(dest), write(input), close() })
   .to(input → dest)
   .withFilenamePolicy(dest → prefix, shard pattern)
   .withEverythingElse() // like in WriteFiles



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-09-06 Thread Jean-Baptiste Onofré

It sounds good to me.

By the way, you will need my help to complete the release process (as you need 
some permissions that you don't have).


Regards
JB

On 09/07/2017 01:00 AM, Reuven Lax wrote:

It sounds like SQL is still not in, and there are a couple of other PRs
that people have requested in 2.2.0. I am mostly out next week, so let's
set September 18 as a target date for cutting the first RC. That should
hopefully give plenty of time to get SQL and the remaining PRs merged into
master.

Reuven

On Thu, Aug 31, 2017 at 3:04 PM, Mingmin Xu  wrote:


Add https://issues.apache.org/jira/browse/BEAM-2833 which is a blocker to
merge DSL_SQL. There may be something wrong in the back-end(maybe
RunnerApi) to handle parametered CustomCoder in TestPipeline.

On Thu, Aug 31, 2017 at 10:38 AM, Jean-Baptiste Onofré 
wrote:


Fair enough.

That's fine for me.

Regards
JB

On Aug 31, 2017, 19:03, at 19:03, Steve Niemitz 
wrote:

I'll chime in as a user who would love to see 2.2.0 sooner than later,
specifically for the file IO Eugene mentioned.  We're using the AvroIO
enhancements extensively, but I am hesitant to run from HEAD in master
in
production.

On Thu, Aug 31, 2017 at 12:42 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


There are a lot of users including very large production customers

who have

been asking specifically for the features that are in 2.2.0 (most of

them

accumulated while 2.1.0 was being iterated on) - mostly I'm referring

to

the vastly improved file IO - and they have been hesitant to use Beam

at

HEAD in production. I think the slight unusualness of having a

release

published soon after the previous release is a small price to pay for
helping those users :)

On Wed, Aug 30, 2017, 11:30 PM Jean-Baptiste Onofré 
wrote:


As we released 2.1.0 couple of weeks ago, it could sound weird to

the

users to
do a 2.2.0 so fast. If we have a blocking issue, we can do a 2.1.1

If

it's

new
features, why not having a release pace in October (2.2.0) ?

Thoughts ?

Regards
JB

On 08/31/2017 08:27 AM, Eugene Kirpichov wrote:

I'd suggest to do 2.2.0 as quickly as possible, and target 2.3.0

for

October. I don't see a reason to delay 2.2.0 until October:

there's a

huge

amount of features worth releasing between when 2.1.0 was cut and

the

current HEAD.

On Wed, Aug 30, 2017 at 10:18 PM Jean-Baptiste Onofré




wrote:


With a 2.2.0 in October, I think we can try to move forward on

RedisIO.


I'm now back from vacation and I will resume the work on this

IO.


Regards
JB

On 08/30/2017 11:27 PM, Eugene Kirpichov wrote:

RedisIO in 2.2.0 is very unlikely. There's still a lot of

review

remaining

last time I checked on the PR.

On Wed, Aug 30, 2017 at 2:24 PM Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:


Any chance to get the RedisIO in this release?
[BEAM-1017] Add RedisIO #1687

Its not my PR but ll be happy to assist if there is anything I

can

do

to

help.

On 30 Aug 2017 22:46, "Daniel Ribeiro"




wrote:


It would be great to get a bump on pubsub
<https://github.com/apache/beam/blob/master/pom.xml#L145>

dependency.

It

is
currently very outdated (v1-rev10-1.22.0, which was released

over a

year

ago
<http://repo1.maven.org/maven2/com/google/apis/google-
api-services-pubsub/v1-rev10-1.22.0/>
).


On Wed, Aug 30, 2017 at 1:27 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Thanks Ismael. I've marked these two issues for fix in

2.2.0.

Definitely

agree that at least the first one must be fixed.

Here's the current burndown list


https://issues.apache.org/jira/projects/BEAM/versions/12341044 -

we

should

clean it up.

On Wed, Aug 30, 2017 at 1:20 PM Ismaël Mejía



wrote:



The current master has accumulated a good amount of nice

features

since 2.1.0 so a new release is welcomed. I have two

JIRAs/PR

that

I

think are important to check/solve before the cut:

BEAM-2516 (this is a regression on the performance of

Direct

runner

on

Java). We had never really defined if a performance

regression is

critical to be a blocker. I executed WordCount with the

kinglear.txt

(170KB) file in version 2.1.0 vs the current 2.2.0-SNAPSHOT

and I

found that the execution time passed from 5s to 126s. So

maybe we

need

to review this one before the release. I can understand if

others

consider this a minor issue because the Direct runner is

not

supposed

to be used for production, but this performance regression

can

cause

a

bad impression for a casual user starting with Beam.

BEAM-2790 (fix reading from Amazon S3 via

HadoopFileSystem). I

think

this one is a nice to have. I am not sure that I can tackle

it

for

the

wednesday cut. I’m OOO until the beginning of next week,

but

maybe

someone else can take a look. In the worst case this is not

a

release

blocker but definitely a really nice fix to include.


On Wed, 

Re: Merge branch DSL_SQL to master

2017-09-07 Thread Jean-Baptiste Onofré

+1

Great work guys !
Ready to help for the merge and maintain !

Regards
JB

On 09/07/2017 08:48 AM, Mingmin Xu wrote:

Hi all,

On behalf of the virtual Beam SQL team[1], I'd like to propose to merge
DSL_SQL branch into master (PR #3782 [2]) and include it in release version
2.2.0, which will give it more visibility to other contributors and users.
The SQL feature satisfies the following criteria outlined in contribution
guide[3].

1. Have at least 2 contributors interested in maintaining it, and 1
committer interested in supporting it

* James and me will continue for new features and maintain it;

   Tyler, James and me will support it as committers;

2. Provide both end-user and developer-facing documentation

* A web page[4] is added to describe the usage of SQL DSL and how it works;


3. Have at least a basic level of unit test coverage

* Totally 230 unit/integration tests, with code coverage 83.4%;

4. Run all existing applicable integration tests with other Beam components
and create additional tests as appropriate

* Besides of integration tests in package org.apache.beam.sdk.extensions.sql,
there's another example in org.apache.beam.sdk.extensions.sql.example.
BeamSqlExample.

[1]. Special thanks to all contributors/reviewers:

  Tyler Akidau

  Davor Bonaci

  Robert Bradshaw

  Lukasz Cwik

  Tarush Grover

  Kai Jiang

  Kenneth Knowles

  Jingsong Lee

  Ismaël Mejía

  Jean-Baptiste Onofré

  James Xu

  Mingmin Xu

[2]. https://github.com/apache/beam/pull/3782

[3]. https://beam.apache.org/contribute/contribution-guide/
#merging-into-master

[4]. https://beam.apache.org/documentation/dsls/sql/

Thanks!

Mingmin



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: PipelineResult

2017-09-09 Thread Jean-Baptiste Onofré

Hi Chaim,

The pipeline result provides the metrics to you. You can periodically poll (with 
a thread) the pipeline result object to get the updated data.


Regards
JB

On 09/10/2017 07:45 AM, Chaim Turkel wrote:

Hi,
   I am having trouble figuring out what to do with the results. I have
multiple collections running on the pipeline, and since the sink does
not give me the option to get the result, i need to wait for the
pipeline to finish and then poll the results.
 From what i can see my only option is to use the metrics, is there
another way to pass information from the collections to the results?

chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Beam 2.2.0 release

2017-09-11 Thread Jean-Baptiste Onofré

Hi,

Any PMC member has the permissions (updating Jira, access to 
reporter.apache.org, etc).


Regards
JB

On 09/11/2017 06:46 PM, Reuven Lax wrote:

Are these permissions that only you have, or does anyone on the PMC have
these permissions? I'm asking so that in the future if you are unavailable,
we know who has these permissions. We should also make sure this is all
documented on the Beam release guide.

Reuven

On Wed, Sep 6, 2017 at 9:25 PM, Jean-Baptiste Onofré 
wrote:


It sounds good to me.

By the way, you will need my help to complete the release process (as you
need some permissions that you don't have).

Regards
JB


On 09/07/2017 01:00 AM, Reuven Lax wrote:


It sounds like SQL is still not in, and there are a couple of other PRs
that people have requested in 2.2.0. I am mostly out next week, so let's
set September 18 as a target date for cutting the first RC. That should
hopefully give plenty of time to get SQL and the remaining PRs merged into
master.

Reuven

On Thu, Aug 31, 2017 at 3:04 PM, Mingmin Xu  wrote:

Add https://issues.apache.org/jira/browse/BEAM-2833 which is a blocker to

merge DSL_SQL. There may be something wrong in the back-end(maybe
RunnerApi) to handle parametered CustomCoder in TestPipeline.

On Thu, Aug 31, 2017 at 10:38 AM, Jean-Baptiste Onofré 
wrote:

Fair enough.


That's fine for me.

Regards
JB

On Aug 31, 2017, 19:03, at 19:03, Steve Niemitz 
wrote:


I'll chime in as a user who would love to see 2.2.0 sooner than later,
specifically for the file IO Eugene mentioned.  We're using the AvroIO
enhancements extensively, but I am hesitant to run from HEAD in master
in
production.

On Thu, Aug 31, 2017 at 12:42 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

There are a lot of users including very large production customers



who have


been asking specifically for the features that are in 2.2.0 (most of


them


accumulated while 2.1.0 was being iterated on) - mostly I'm referring


to


the vastly improved file IO - and they have been hesitant to use Beam


at


HEAD in production. I think the slight unusualness of having a


release


published soon after the previous release is a small price to pay for
helping those users :)

On Wed, Aug 30, 2017, 11:30 PM Jean-Baptiste Onofré 
wrote:

As we released 2.1.0 couple of weeks ago, it could sound weird to



the



users to

do a 2.2.0 so fast. If we have a blocking issue, we can do a 2.1.1


If



it's


new
features, why not having a release pace in October (2.2.0) ?

Thoughts ?

Regards
JB

On 08/31/2017 08:27 AM, Eugene Kirpichov wrote:


I'd suggest to do 2.2.0 as quickly as possible, and target 2.3.0


for



October. I don't see a reason to delay 2.2.0 until October:



there's a



huge



amount of features worth releasing between when 2.1.0 was cut and


the



current HEAD.


On Wed, Aug 30, 2017 at 10:18 PM Jean-Baptiste Onofré







wrote:


With a 2.2.0 in October, I think we can try to move forward on



RedisIO.





I'm now back from vacation and I will resume the work on this


IO.





Regards
JB

On 08/30/2017 11:27 PM, Eugene Kirpichov wrote:


RedisIO in 2.2.0 is very unlikely. There's still a lot of


review



remaining



last time I checked on the PR.

On Wed, Aug 30, 2017 at 2:24 PM Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:

Any chance to get the RedisIO in this release?

[BEAM-1017] Add RedisIO #1687

Its not my PR but ll be happy to assist if there is anything I


can



do


to


help.


On 30 Aug 2017 22:46, "Daniel Ribeiro"







wrote:


It would be great to get a bump on pubsub

<https://github.com/apache/beam/blob/master/pom.xml#L145>


dependency.



It



is

currently very outdated (v1-rev10-1.22.0, which was released


over a



year



ago

<http://repo1.maven.org/maven2/com/google/apis/google-
api-services-pubsub/v1-rev10-1.22.0/>
).


On Wed, Aug 30, 2017 at 1:27 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Thanks Ismael. I've marked these two issues for fix in



2.2.0.



Definitely



agree that at least the first one must be fixed.


Here's the current burndown list

https://issues.apache.org/jira/projects/BEAM/versions/12341044

-


we


should



clean it up.

On Wed, Aug 30, 2017 at 1:20 PM Ismaël Mejía






wrote:





The current master has accumulated a good amount of nice



features



since 2.1.0 so a new release is welcomed. I have two



JIRAs/PR



that


I


think are important to check/solve before the cut:


BEAM-2516 (this is a regression on the performance of


Direct



runner


on



Java). We had never really defined if a performance



regression is



critical to be a blocker. I executed WordCount with the



kinglear.txt



(170KB) file in version 2.1.0 vs the current 2.2.0-SNAPSHOT



and I



found that the execution time passed from 5s to 126s. So



maybe we



need



to review this one b

Re: PBegin, PDone

2017-09-13 Thread Jean-Baptiste Onofré

Hi,

I don't think it makes sense on a transform (as it expects a PCollection). 
However, why not introducing a specific hook for that.


I think you can workaround using a Pipeline Visitor, but it would be runner 
level.

Regards
JB

On 09/14/2017 08:21 AM, Chaim Turkel wrote:

Hi,
   I have a few scenarios where I would like to have code that is
before the PBegin and after the PDone.
This is usually for monitoring purposes.
It would be nice to be able to transform from PBegin to PBegin, and
PDone to PDone, so that code can be run before and after and not in
the driver program


chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: flaky testSplit in beam-sdks-java-io-elasticsearch-tests-5

2017-09-15 Thread Jean-Baptiste Onofré

Thanks,

in the mean time, maybe we can ignore this test (to avoid to impact the build) ?

Regards
JB

On 09/15/2017 02:34 PM, Etienne Chauchot wrote:

Hi all,

It seems that ElasticsearchIOTest.testSplit is flaky in artifact 
beam-sdks-java-io-elasticsearch-tests-5.


I'm investigating this issue.

Etienne



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: flaky testSplit in beam-sdks-java-io-elasticsearch-tests-5

2017-09-18 Thread Jean-Baptiste Onofré

Thanks !

I'm on it.

Regards
JB

On 09/18/2017 11:30 AM, Etienne Chauchot wrote:

Hi,

FYI: here is the fix for ElasticsearchIOTest.testSplit flaky test in artifact 
beam-sdks-java-io-elasticsearch-tests-5.


https://github.com/apache/beam/pull/3860

Best

Etienne


Le 15/09/2017 à 15:56, Etienne Chauchot a écrit :

Yes, sure, please ignore this test, I'll fix it soon.

Best,

Etienne


Le 15/09/2017 à 15:15, Jean-Baptiste Onofré a écrit :

Thanks,

in the mean time, maybe we can ignore this test (to avoid to impact the build) ?

Regards
JB

On 09/15/2017 02:34 PM, Etienne Chauchot wrote:

Hi all,

It seems that ElasticsearchIOTest.testSplit is flaky in artifact 
beam-sdks-java-io-elasticsearch-tests-5.


I'm investigating this issue.

Etienne









--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: TikaIO concerns

2017-09-19 Thread Jean-Baptiste Onofré

Hi Sergey,

as discussed together during the review, I fully understand the choices you did.

Your plan sounds reasonable. Thanks !

Generally speaking, in order to give visibility and encourage contribution, I 
think it would make sense to accept a PR if it's basically right (even if it's 
not yet perfect) and doesn't break the build.


I would be happy to help on TikaIO as I did during the first review round ;)

Regards
JB

On 09/19/2017 12:41 PM, Sergey Beryozkin wrote:

Hi All

This is my first post the the dev list, I work for Talend, I'm a Beam novice, 
Apache Tika fan, and thought it would be really great to try and link both 
projects together, which led me to opening [1] where I typed some early 
thoughts, followed by PR [2].


I noticed yesterday I had the robust :-) (but useful and helpful) newer review 
comments from Eugene pending, so I'd like to summarize a bit why I did TikaIO 
(reader) the way I did, and then decide, based on the feedback from the experts, 
what to do next.


Apache Tika Parsers report the text content in chunks, via SaxParser events. 
It's not possible with Tika to take a file and read it bit by bit at the 
'initiative' of the Beam Reader, line by line, the only way is to handle the 
SAXParser callbacks which report the data chunks. Some parsers may report the 
complete lines, some individual words, with some being able report the data only 
after the completely parse the document.

All depends on the data format.

At the moment TikaIO's TikaReader does not use the Beam threads to parse the 
files, Beam threads will only collect the data from the internal queue where the 
internal TikaReader's thread will put the data into

(note the data chunks are ordered even though the tests might suggest 
otherwise).

The reason I did it was because I thought

1) it would make the individual data chunks available faster to the pipeline - 
the parser will continue working via the binary/video etc file while the data 
will already start flowing - I agree there should be some tests data available 
confirming it - but I'm positive at the moment this approach might yield some 
performance gains with the large sets. If the file is large, if it has the 
embedded attachments/videos to deal with, then it may be more effective not to 
get the Beam thread deal with it...


2) As I commented at the end of [2], having an option to concatenate the data 
chunks first before making them available to the pipeline is useful, and I guess 
doing the same in ParDo would introduce some synchronization issues (though not 
exactly sure yet)


One of valid concerns there is that the reader is polling the internal queue so, 
in theory at least, and perhaps in some rare cases too, we may have a case where 
the max polling time has been reached, the parser is still busy, and TikaIO 
fails to report all the file data. I think that it can be solved by either 2a) 
configuring the max polling time to a very large number which will never be 
reached for a practical case, or 2b) simply use a blocking queue without the 
time limits - in the worst case, if TikaParser spins and fails to report the end 
of the document, then, Bean can heal itself if the pipeline blocks.

I propose to follow 2b).


Please let me know what you think.
My plan so far is:
1) start addressing most of Eugene's comments which would require some minor 
TikaIO updates
2) work on removing the TikaSource internal code dealing with File patterns 
which I copied from TextIO at the next stage
3) If needed - mark TikaIO Experimental to give Tika and Beam users some time to 
try it with some real complex files and also decide if TikaIO can continue 
implemented as a BoundedSource/Reader or not


Eugene, all, will it work if I start with 1) ?

Thanks, Sergey

[1] https://issues.apache.org/jira/browse/BEAM-2328
[2] https://github.com/apache/beam/pull/3378


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSSION] using NexMark for Beam

2017-09-19 Thread Jean-Baptiste Onofré
m community :)
>>>>
>>>> Some comments:
>>>>
>>>> - This does not need to have a feature branch since we have been
>>>> working on this in a fork for months now and with the stable API we
>>>> can simply do a traditional PR review. Of course the review is a
>bit
>>>> bigger so we expect it to take some time, but I hope we can get
>some
>>>> quick progress once FSR is out.
>>>>
>>>> - We need a hand from the google guys, for the moment we have
>tested
>>>> all the queries in all the runners, but not in the Dataflow runner
>>>> because we don't have access to it (well we have but not with the
>>>> freedom that you guys have to run the benchmark at will), so if we
>can
>>>> get some access that would be nice or if this is not possible, it
>>>> would be nice if some of you guys help us test/report any given
>issue
>>>> on this runner,
>>>>
>>>> - We also have to decide the future of some features, this is
>probably
>>>> independent of the current PR and part of the evolution of Nexmark
>on
>>>> Beam:
>>>>
>>>> -- There are still some pending things that can be improved even
>after
>>>> the review once in master, e.g. we have for the moment only
>synthetic
>>>> sources but the original version took also data from Pubsub, we
>have
>>>> to define the correct scope for this and given the case also add
>other
>>>> sources, e.g. Kafka, HDFS.
>>>>
>>>> -- Query 10 is really oriented to testing Google Runner/IOs
>specific
>>>> features, so we have to decide what to do with this one, maybe
>>>> mirroring it with Kafka/HDFS to have something equivalent in the
>>>> Apache world.
>>>>
>>>> This is all for now, I am really glad that this is finally
>happening
>>>> and I hope this soon gets merged.
>>>>
>>>> Ismaël
>>>>
>>>> On Fri, May 12, 2017 at 6:07 PM, Lukasz Cwik
>
>>>> wrote:
>>>>
>>>>> I think these are valuable enough that we should get them into
>>>>>
>>>> apache/master
>>>>
>>>>> On Fri, May 12, 2017 at 4:34 AM, Jean-Baptiste Onofré
>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> PR or even a feature branch could work. Up to you.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>>
>>>>>> On 05/12/2017 10:55 AM, Etienne Chauchot wrote:
>>>>>>
>>>>>> Hi guys,
>>>>>>>
>>>>>>> I wanted to let you know that I have just submitted a PR around
>>>>>>>
>>>>>> NexMark.
>>>>
>>>>> This is
>>>>>>> a port of the NexMark queries to Beam, to be used as integration
>>>>>>> tests.
>>>>>>> This can also be used as A-B testing (no-regression or
>performance
>>>>>>> comparison
>>>>>>> between 2 versions of the same engine or of the same runner)
>>>>>>>
>>>>>>> This a continuation of the previous PR (#99) from Mark Shields.
>>>>>>> The code has changed quite a bit: some queries have changed to
>use new
>>>>>>> Beam APIs
>>>>>>> and there where some big refactorings. More important, we can
>now run
>>>>>>>
>>>>>> all
>>>>
>>>>> the
>>>>>>> queries in all the runners.
>>>>>>>
>>>>>>> Nevertheless, there are still some open issues in Nexmark
>>>>>>> (https://github.com/iemejia/beam/issues) and in Beam upstream
>(see
>>>>>>>
>>>>>> issue
>>>>
>>>>> links
>>>>>>> in https://issues.apache.org/jira/browse/BEAM-160)
>>>>>>>
>>>>>>> I wanted to submit the PR before our (Ismaël and I) NexMark talk
>at
>>>>>>> the
>>>>>>> ApacheCon. The PR is not perfect but it is in a good shape to
>share
>>>>>>> it.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Etienne
>>>>>>>
>>>>>&g

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-19 Thread Jean-Baptiste Onofré

+1

Regards
JB

On 09/19/2017 11:05 PM, Charles Chen wrote:

The latest version (2.1.0) of Beam Python (
https://pypi.python.org/pypi/apache-beam) is broken due to a change in the
"six" dependency (BEAM-2964
<https://issues.apache.org/jira/browse/BEAM-2964>).  For instance,
installing "apache-beam" in a clean environment and running "python -m
apache_beam.examples.wordcount" results in a failure.  This issue is fixed
at head with Robert's recent PR (https://github.com/apache/beam/pull/3865).

I propose to cherry-pick this change on top of the 2.1.0 release branch (to
form a new 2.1.1 release branch) and call a vote to release version 2.1.1
only for Beam Python.

Alternatively, to preserve version alignment we could also re-release Beam
Java 2.1.1 with the same code as 2.1.0 modulo the version bump.  Thoughts?

Best,
Charles



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Update on 2.2.0 release

2017-09-19 Thread Jean-Baptiste Onofré

Hi Reuven,

thanks for the update. I will pick up some Jira in my bucket (to verify/double 
check or implement).


Regards
JB

On 09/20/2017 04:23 AM, Reuven Lax wrote:

There are 7 issues left on
https://issues.apache.org/jira/projects/BEAM/versions/12341044 after I
moved some items to 2.3.0 if they appeared to not need to be in 2.2.0 (some
were originally slated for 2.1.0, and there was no apparent progress on
them). Please object if you feel that it's important that any of these be
in 2.2.0

Of the remaining issues, three (BEAM-2956, BEAM-2298, BEAM-2271) appear to
already be fixed; please verify this and close the issue if you are the
reporter. The remaining issues (BEAM-2834, BEAM-2858, BEAM-2870,
BEAM-29540) all have outstanding pull requests. As soon as all of these
pull requests are merged, I will start the release cut.

Reuven



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: TikaIO concerns

2017-09-19 Thread Jean-Baptiste Onofré

Hi Eugene,

fully agree ! My point was more in term of features: I think it's fair to 
postpone some features in an IO in new PRs.
For example, when we created JmsIO, it only supports TextMessage and new message 
types support will be added in new improvement PRs.


It's what I meant by "basically good",  it's more in feature scope.

Regards
JB

On 09/20/2017 01:18 AM, Eugene Kirpichov wrote:

On Tue, Sep 19, 2017 at 5:13 AM Jean-Baptiste Onofré 
wrote:


Hi Sergey,

as discussed together during the review, I fully understand the choices
you did.

Your plan sounds reasonable. Thanks !

Generally speaking, in order to give visibility and encourage
contribution, I
think it would make sense to accept a PR if it's basically right (even if
it's
not yet perfect) and doesn't break the build.


This is a wider discussion than the current thread, but I don't think I
agree with this approach.

We have followed a much stricter standard in the past, and thanks to that,
Beam currently has (in my opinion) an extremely high-quality library of
IOs, and Beam can take pride in not being one of "those" open-source
projects that advertise everything but guarantee nothing and are
frustrating to work with, because everything is slightly broken in some way
or another.

I can recall at most 1 or 2 cases where a contributor gave up on a PR due
to the amount of issues pointed out during review - and in those cases, the
PR was usually in a state where Beam would not have benefitted from merging
the issue-ridden code anyway. Basically, a thorough review in all cases
I've seen so far has been a good idea in retrospect.

There may be trivial fixups best done by a committer rather than author
(e.g. javadoc typos), but I think nontrivial, high-level issues should be
reviewed rigorously.

People trust Beam (especially Beam IOs) with their data, and at least the
correctness-critical stuff *must* be done right. Beam also generally
promises a stable API, so API mistakes are forever, and can not be fixed
iteratively [this can be addressed by marking in-progress work as
@Experimental] - so APIs must be done right as well. On the other hand,
performance, documentation, lack of support for certain features etc. can
be fixed iteratively - I agree that we shouldn't push too hard on that
during review.

There's also the mentorship aspect: I think it is valuable for new Beam
contributors to get thorough review, especially for their first
contributions, as a kick-start to learning the best practices - they are
going to need them repeatedly in their future contributions. Merging code
without sufficient review gives them the immediate gratification of "having
contributed", but denies the mentorship. Moreover, most contributions are
made by a relatively small number of prolific "serial contributors" (you
being a prime example!) who are responsive to feedback and eager to learn,
so the immediate gratification I think is not very important.

I think the best way to handle code reviews for Beam is to give it our best
as reviewers, especially for first-time contributors; and if it feels like
the amount of required changes is too large for the contributor to handle,
then work with them to prioritize the changes, or start small and decompose
the contribution into more manageable pieces, but each merged piece must be
high-quality.



I would be happy to help on TikaIO as I did during the first review round
;)

Regards
JB

On 09/19/2017 12:41 PM, Sergey Beryozkin wrote:

Hi All

This is my first post the the dev list, I work for Talend, I'm a Beam

novice,

Apache Tika fan, and thought it would be really great to try and link

both

projects together, which led me to opening [1] where I typed some early
thoughts, followed by PR [2].

I noticed yesterday I had the robust :-) (but useful and helpful) newer

review

comments from Eugene pending, so I'd like to summarize a bit why I did

TikaIO

(reader) the way I did, and then decide, based on the feedback from the

experts,

what to do next.

Apache Tika Parsers report the text content in chunks, via SaxParser

events.

It's not possible with Tika to take a file and read it bit by bit at the
'initiative' of the Beam Reader, line by line, the only way is to handle

the

SAXParser callbacks which report the data chunks. Some parsers may

report the

complete lines, some individual words, with some being able report the

data only

after the completely parse the document.
All depends on the data format.

At the moment TikaIO's TikaReader does not use the Beam threads to parse

the

files, Beam threads will only collect the data from the internal queue

where the

internal TikaReader's thread will put the data into
(note the data chunks are ordered even though the tests might suggest

otherwise).


The reason I did it was because I thought

1) it would make the individual data chunks avai

Re: [VOTE] Release 2.1.1, release candidate #1

2017-09-21 Thread Jean-Baptiste Onofré

+1 (binding)

Tested & checked:
- build
- ASF header/rat
- examples
- samples

Thanks
Regards
JB

On 09/21/2017 10:01 AM, Robert Bradshaw wrote:

Hi everyone,

As discussed earlier in this list [1] we'd like to get a bugfix
release out for beam 2.1. Please review and vote on the release
candidate #1 for the version 2.1.1, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

Artifacts are at [2] and the full diff can be viewed at [3] (two
cherry picks and a version bump).

The vote will be open for at least 36 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Robert


[1] 
http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201709.mbox/browser
[2] https://dist.apache.org/repos/dist/dev/beam/2.1.1/
[3] https://github.com/apache/beam/compare/release-2.1.0...release-2.1.1



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [Proposal] Beam Newsletter

2017-09-21 Thread Jean-Baptiste Onofré

Hi,

It's a great idea. I think user mailing list is enough and makes more sense.

So, the "Call for updates" with the doc should be send on the dev mailing list 
and the result (not the google doc but a copy-paste in the mail body) on the 
user mailing list.


Regards
JB

On 09/22/2017 03:31 AM, Griselda Cuevas wrote:

Hi Beam Community,

I have a proposal to start sending *monthly newsletters* to our dev and
user mailing lists. The idea is to summarize what's happening in the
project and keep everyone informed of what's happening, specially new
members, people interested in specific initiatives/efforts and help
visualize the progress in concrete milestones.

This is what I propose:

1. I'll send an email to the dev & user mailing list with a "Call for
updates" with a link to a Google doc like this
<https://docs.google.com/document/d/1C4L8b1It9Ju1JgJaSvSPlAYMlG0Q4v4hbjebkmNsnQ8/edit>
so people can add their updates.
2. I'll edit it to get it ready to share
3. I'll send an email with highlights and the link to the finalized Google
doc

I propose to do this monthly.

What do you guys think?

Gris



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-09-27 Thread Jean-Baptiste Onofré

+1

Regards
JB

On 09/27/2017 06:04 PM, Reuven Lax wrote:

I would support this as well, however we probably should first poll current
users of the Beam API to see if any of them rely on Java 7.

Reuven

On Wed, Sep 27, 2017 at 9:00 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:


I also think that it's time to seriously consider dropping support for
Java 7.

On Tue, Sep 26, 2017 at 9:14 PM, Daniel Oliveira
 wrote:

Yes, just as Ismaël said it's a compilation blocker right now despite

that

(I believe) we don't use the extension that's breaking.

As for other ways to solve this, if there is a way to avoid compiling the
advanced features of AutoValue that might be worth a try. We could also

try

to get a release of AutoValue with the fix that works in Java 7. However

I

feel that slowly moving over to Java 8 is the most future-proof solution

if

it's possible.

On Tue, Sep 26, 2017 at 2:47 PM, Ismaël Mejía  wrote:


The current issue is that compilation fails on master because beam's
parent pom is configured to fail if it finds warnings):

 -Werror

However if you remove that line from the parent pom the compilation

passes.


Of course this does not mean that everything is solved for Java 9,
there are some tests that break and other issues because of other
plugins and dependencies (e.g. bytebuddy), but those are not part of
this discussion.

On Tue, Sep 26, 2017 at 11:38 PM, Eugene Kirpichov
 wrote:

AFAIK we don't use any advanced capabilities of AutoValue. Does that

mean

this issue is moot? I didn't quite understand from your email whether

it

is

a compilation blocker for Beam or not.

On Tue, Sep 26, 2017 at 2:32 PM Ismaël Mejía 

wrote:



Great that you are also working on this too Daniel and thanks for
bringing this subject to the mailing list, I was waiting to  my

return

to office next week, but you did it first :)

Eugene for reference (This is the issue on the migration to Java 9),
notice that here the goal is first that beam passes mvn clean install
with pure Java 9 (and also add this to jenkins), not to rewrite
anything to use the new stuff (e.g. modules):
https://issues.apache.org/jira/browse/BEAM-2530

Eugene can you also PTAL at the AutoValue issue, more details on the
issue, this is a warning so I don't know if it is really critical in
particular because we are not using Memoization (do we?).
https://github.com/google/auto/issues/503

https://github.com/google/auto/commit/71514081f2ca6fb4ead2b7f0a25f5d

02247b8532


Wouldn't the easiest way be that you guys convince the google.auto
guys to generate that simple fix in a Java 7 compatible way and
'voila' ?

However I agree that moving to Java 8 is an excellent idea and as
Eugene mentions there is less friction now since most projects are
moving, only pending issue are existing clusters on java 7 in the
hadoop world, but those are less frequent now. Anyway this discussion
is really important (maybe even worth a vote). Because moving to Java
8 will allow us also to move some of the dependencies that we are
keeping in old versions and in general to move forward.

What do the others think ?



On Tue, Sep 26, 2017 at 11:09 PM, Eugene Kirpichov
 wrote:

Very excited to hear that there's work on JDK9 support - is there a

public

description of the plans for this work somewhere?

In general, Beam could probably drop Java7 support altogether at

some

point

soon: Java7 has reached end-of-life (i.e. it's not receiving even

security

patches) 2 years ago, and all major players in the data processing
ecosystem have dropped Java7 support (Spark, Flink, Hadoop), so I

presume

the demand for Java7 support in the data processing industry is

low.

By

the

way: would a Java8 migration be in the scope of your work in

general?


However, until we say that Beam requires Java8, what would be the
implications of using a version of AutoValue that can only be

compiled

with

Java8? Are you saying that this is simply a matter of a compiler

bug,

and

if we use a Java8 compiler but configured to use source and target

versions

of 1.7 and using bootclasspath of rt.jar from 1.7, then the

resulting

Beam

artifacts will be usable by people who don't have Java8?

On Tue, Sep 26, 2017 at 1:53 PM Daniel Oliveira
 wrote:


So I've been working on JDK 9 support for Beam, and I have a bug

in

AutoValue that can be fixed by updating our AutoValue dependency

to

the

latest. The problem is that AutoValue from 1.5+ seems to be banned

for

Beam

due to requiring Java 8 compilers. However, it should still be

possible

to

compile and execute Java 7 code from the Java 8 compiler by

building

with

the correct arguments. So the fix to this bug would essentially

require

Java 8 compilers even for compiling Java 7 code.

Does anyone need to use Java 7 compilers? Because if not I would

like to

continue with this fix.











--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-09-27 Thread Jean-Baptiste Onofré

Definitely agree

On 09/27/2017 06:00 PM, Robert Bradshaw wrote:

I also think that it's time to seriously consider dropping support for Java 7.

On Tue, Sep 26, 2017 at 9:14 PM, Daniel Oliveira
 wrote:

Yes, just as Ismaël said it's a compilation blocker right now despite that
(I believe) we don't use the extension that's breaking.

As for other ways to solve this, if there is a way to avoid compiling the
advanced features of AutoValue that might be worth a try. We could also try
to get a release of AutoValue with the fix that works in Java 7. However I
feel that slowly moving over to Java 8 is the most future-proof solution if
it's possible.

On Tue, Sep 26, 2017 at 2:47 PM, Ismaël Mejía  wrote:


The current issue is that compilation fails on master because beam's
parent pom is configured to fail if it finds warnings):

 -Werror

However if you remove that line from the parent pom the compilation passes.

Of course this does not mean that everything is solved for Java 9,
there are some tests that break and other issues because of other
plugins and dependencies (e.g. bytebuddy), but those are not part of
this discussion.

On Tue, Sep 26, 2017 at 11:38 PM, Eugene Kirpichov
 wrote:

AFAIK we don't use any advanced capabilities of AutoValue. Does that mean
this issue is moot? I didn't quite understand from your email whether it

is

a compilation blocker for Beam or not.

On Tue, Sep 26, 2017 at 2:32 PM Ismaël Mejía  wrote:


Great that you are also working on this too Daniel and thanks for
bringing this subject to the mailing list, I was waiting to  my return
to office next week, but you did it first :)

Eugene for reference (This is the issue on the migration to Java 9),
notice that here the goal is first that beam passes mvn clean install
with pure Java 9 (and also add this to jenkins), not to rewrite
anything to use the new stuff (e.g. modules):
https://issues.apache.org/jira/browse/BEAM-2530

Eugene can you also PTAL at the AutoValue issue, more details on the
issue, this is a warning so I don't know if it is really critical in
particular because we are not using Memoization (do we?).
https://github.com/google/auto/issues/503

https://github.com/google/auto/commit/71514081f2ca6fb4ead2b7f0a25f5d

02247b8532


Wouldn't the easiest way be that you guys convince the google.auto
guys to generate that simple fix in a Java 7 compatible way and
'voila' ?

However I agree that moving to Java 8 is an excellent idea and as
Eugene mentions there is less friction now since most projects are
moving, only pending issue are existing clusters on java 7 in the
hadoop world, but those are less frequent now. Anyway this discussion
is really important (maybe even worth a vote). Because moving to Java
8 will allow us also to move some of the dependencies that we are
keeping in old versions and in general to move forward.

What do the others think ?



On Tue, Sep 26, 2017 at 11:09 PM, Eugene Kirpichov
 wrote:

Very excited to hear that there's work on JDK9 support - is there a

public

description of the plans for this work somewhere?

In general, Beam could probably drop Java7 support altogether at some

point

soon: Java7 has reached end-of-life (i.e. it's not receiving even

security

patches) 2 years ago, and all major players in the data processing
ecosystem have dropped Java7 support (Spark, Flink, Hadoop), so I

presume

the demand for Java7 support in the data processing industry is low.

By

the

way: would a Java8 migration be in the scope of your work in general?

However, until we say that Beam requires Java8, what would be the
implications of using a version of AutoValue that can only be compiled

with

Java8? Are you saying that this is simply a matter of a compiler bug,

and

if we use a Java8 compiler but configured to use source and target

versions

of 1.7 and using bootclasspath of rt.jar from 1.7, then the resulting

Beam

artifacts will be usable by people who don't have Java8?

On Tue, Sep 26, 2017 at 1:53 PM Daniel Oliveira
 wrote:


So I've been working on JDK 9 support for Beam, and I have a bug in
AutoValue that can be fixed by updating our AutoValue dependency to

the

latest. The problem is that AutoValue from 1.5+ seems to be banned

for

Beam

due to requiring Java 8 compilers. However, it should still be

possible

to

compile and execute Java 7 code from the Java 8 compiler by building

with

the correct arguments. So the fix to this bug would essentially

require

Java 8 compilers even for compiling Java 7 code.

Does anyone need to use Java 7 compilers? Because if not I would

like to

continue with this fix.







--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Hi

2017-10-02 Thread Jean-Baptiste Onofré

Done,

welcome aboard !

Regards
JB

On 10/02/2017 11:15 AM, Yonatan Seneor wrote:

Hi
My username  on Apache JIRA is: yseneor
Thanks
Yoni Seneor


‫בתאריך יום ב׳, 2 באוק׳ 2017 ב-11:26 מאת ‪Yonatan Seneor‬‏ <‪
ysen...@gmail.com‬‏>:‬


My name isYoni Seneor I am A devOps engineer  at PayPal, I am a part of a
Team that uses Beam and I would like to contribute to the Project.
Please add me to the contributor list on JIRA.

Thanks
Yoni Seneor





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Contributor introduction

2017-10-02 Thread Jean-Baptiste Onofré

Hi Uri,

what's your Jira ID ?

Thanks,
Regards
JB

On 10/02/2017 02:31 PM, Uri Silberstein wrote:

Hi all,

My name is Uri Silberstein and I am part of a PayPal team that works with
Beam.

I would like to contribute to the Project.

Please add me to the contributor list on JIRA, so I can assign to myself a
task that I've just opened.

Thanks,

Uri Silberstein



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: spark-submit forces jackson 2.4.4

2017-10-02 Thread Jean-Baptiste Onofré
Hi

Do you start your pipeline with spark-submit ? If so you can provide the 
packages. You can also create a shaded jar.

I have a similar issue in the spark 2 runner that I worked around by aligning 
the dependencies.

Regards
JB

On Oct 2, 2017, 20:04, at 20:04, Jacob Marble  wrote:
>My Beam pipeline runs fine with DirectRunner and DataflowRunner, but
>fails
>with SparkRunner. That stack trace is after this message.
>
>The exception indicates that
>com.fasterxml.jackson.databind.ObjectMapper.enable doesn't exist.
>ObjectMapper.enable() didn't exist until Jackson 2.5. `mvn
>dependency:tree
>-Dverbose` shows that spark-core_2.10 (1.6.3) and beam-runners-spark
>(2.1.0) both request versions of Jackson before 2.5.
>
>Since I'm using a local, standalone Spark cluster for development, I
>have
>to include spark-core_2.10 version 1.6.3 in dependencies.
>
>I have added explicit dependencies to my pom.xml, so that I can be
>certain
>that the more recent version of Jackson is included in my shaded jar.
>`mvn
>clean package` confirms this:
>
>[INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.8.9 in
>the
>shaded jar.
>[INFO] Including
>com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9
>in the shaded jar.
>[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.8.9
>in
>the shaded jar.
>[INFO] Including
>com.fasterxml.jackson.module:jackson-module-scala_2.10:jar:2.8.9 in the
>shaded jar.
>[INFO] Including
>com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.8.9 in the
>shaded jar.
>[INFO] Including
>com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.8.9 in
>the
>shaded jar.
>
>Beyond jar creation, is there anything I can do to ensure that my
>chosen
>version of a dependency is used when Spark runs my pipeline? I can't be
>the
>first to encounter this problem.
>
>Thanks!
>
>Jacob
>
>
>
>Exception in thread "main" java.lang.RuntimeException:
>java.lang.NoSuchMethodError:
>com.fasterxml.jackson.databind.ObjectMapper.enable([Lcom/fasterxml/jackson/core/JsonParser$Feature;)Lcom/fasterxml/jackson/databind/ObjectMapper;
>at
>org.apache.beam.runners.spark.SparkPipelineResult.runtimeExceptionFrom(SparkPipelineResult.java:55)
>at
>org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:72)
>at
>org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
>at
>org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:87)
>at com.kochava.beam.jobs.ExampleS3.main(ExampleS3.java:46)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at
>org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>at
>org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>Caused by: java.lang.NoSuchMethodError:
>com.fasterxml.jackson.databind.ObjectMapper.enable([Lcom/fasterxml/jackson/core/JsonParser$Feature;)Lcom/fasterxml/jackson/databind/ObjectMapper;
>at
>com.amazonaws.partitions.PartitionsLoader.(PartitionsLoader.java:54)
>at
>com.amazonaws.regions.RegionMetadataFactory.create(RegionMetadataFactory.java:30)
>at com.amazonaws.regions.RegionUtils.initialize(RegionUtils.java:64)
>at
>com.amazonaws.regions.RegionUtils.getRegionMetadata(RegionUtils.java:52)
>at com.amazonaws.regions.RegionUtils.getRegion(RegionUtils.java:105)
>at
>com.amazonaws.client.builder.AwsClientBuilder.withRegion(AwsClientBuilder.java:239)
>at com.kochava.beam.s3.S3Util.(S3Util.java:103)
>at com.kochava.beam.s3.S3Util.(S3Util.java:53)
>at com.kochava.beam.s3.S3Util$S3UtilFactory.create(S3Util.java:81)
>at com.kochava.beam.s3.S3Util$S3UtilFactory.create(S3Util.java:55)


[DISCUSS] Switch to gitbox

2017-10-06 Thread Jean-Baptiste Onofré

Hi guys,

We use Apache gitbox for the website and it works fine (as soon as you linked 
your Apache and github with 2FA enabled).

What do you think about moving to gitbox for the codebase itself ?

It could speed up the review and merge for the PRs.

Thoughts ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



Re: Update on 2.2.0 release

2017-10-08 Thread Jean-Baptiste Onofré

It sounds good to me.

Regards
JB

On 10/09/2017 12:38 AM, Reuven Lax wrote:

At this point it's taken over three weeks to resolve release-blocking
issues, and there are still some remaining. Monday morning I will go ahead
and cut an initial release candidate, so we get a snapshot to base on. The
remaining blocking issues will either be cherry picked into an additional
release candidate or punted to 2.3.0, depending on severity.

Reuven

On Wed, Sep 20, 2017 at 5:34 AM, Jean-Baptiste Onofré 
wrote:


Hi Reuven,

thanks for the update. I will pick up some Jira in my bucket (to
verify/double check or implement).

Regards
JB


On 09/20/2017 04:23 AM, Reuven Lax wrote:


There are 7 issues left on
https://issues.apache.org/jira/projects/BEAM/versions/12341044 after I
moved some items to 2.3.0 if they appeared to not need to be in 2.2.0
(some
were originally slated for 2.1.0, and there was no apparent progress on
them). Please object if you feel that it's important that any of these be
in 2.2.0

Of the remaining issues, three (BEAM-2956, BEAM-2298, BEAM-2271) appear to
already be fixed; please verify this and close the issue if you are the
reporter. The remaining issues (BEAM-2834, BEAM-2858, BEAM-2870,
BEAM-29540) all have outstanding pull requests. As soon as all of these
pull requests are merged, I will start the release cut.

Reuven



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Update on 2.2.0 release

2017-10-10 Thread Jean-Baptiste Onofré

Hi Reuven,

Good one !

I plan to cherry pick RedisIO on the 2.2.0 release branch as soon as I merged on 
master. No objection ?


Regards
JB

On 10/10/2017 12:53 AM, Reuven Lax wrote:

I've now cut a release branch for 2.2.0.

On Mon, Oct 9, 2017 at 2:01 AM, Jean-Baptiste Onofré 
wrote:


It sounds good to me.

Regards
JB


On 10/09/2017 12:38 AM, Reuven Lax wrote:


At this point it's taken over three weeks to resolve release-blocking
issues, and there are still some remaining. Monday morning I will go ahead
and cut an initial release candidate, so we get a snapshot to base on. The
remaining blocking issues will either be cherry picked into an additional
release candidate or punted to 2.3.0, depending on severity.

Reuven

On Wed, Sep 20, 2017 at 5:34 AM, Jean-Baptiste Onofré 
wrote:

Hi Reuven,


thanks for the update. I will pick up some Jira in my bucket (to
verify/double check or implement).

Regards
JB


On 09/20/2017 04:23 AM, Reuven Lax wrote:

There are 7 issues left on

https://issues.apache.org/jira/projects/BEAM/versions/12341044 after I
moved some items to 2.3.0 if they appeared to not need to be in 2.2.0
(some
were originally slated for 2.1.0, and there was no apparent progress on
them). Please object if you feel that it's important that any of these
be
in 2.2.0

Of the remaining issues, three (BEAM-2956, BEAM-2298, BEAM-2271) appear
to
already be fixed; please verify this and close the issue if you are the
reporter. The remaining issues (BEAM-2834, BEAM-2858, BEAM-2870,
BEAM-29540) all have outstanding pull requests. As soon as all of these
pull requests are merged, I will start the release cut.

Reuven


--

Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


[VOTE] Migrate to gitbox

2017-10-10 Thread Jean-Baptiste Onofré

Hi all,

following the discussion, here's the formal vote to migrate to gitbox:

[ ] +1, Approve to migrate to gitbox
[ ] -1, Do not migrate (please provide specific comments)

The vote will be open for at least 36 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Switch to gitbox

2017-10-10 Thread Jean-Baptiste Onofré

Agree,

However, I will a formal vote for the records. As soon as the vote passes, I 
will proceed with the next steps (with INFRA).


Regards
JB

On 10/10/2017 12:31 AM, Kenneth Knowles wrote:

Sounds like wild agreement - JB will you be the one to proceed with the
process?

On Mon, Oct 9, 2017 at 3:24 PM, Thomas Weise  wrote:


+1

It will enable proper work with PRs in the github interface (like
requesting reviewers, merging and closing inactive PRs, after github and
ASF IDs are linked)


On Mon, Oct 9, 2017 at 2:52 PM, Ismaël Mejía  wrote:


+1

On Oct 9, 2017 6:52 PM, "Thomas Groh"  wrote:


+1.

I do love myself a forcing function for passing tests.

On Mon, Oct 9, 2017 at 7:51 AM, Aljoscha Krettek 
wrote:


+1


On 6. Oct 2017, at 18:57, Kenneth Knowles 

wrote:


Sounds great. If I recall correctly, it means we could also us

assignment /

review requests to pass pull requests around, instead of "R: foo"

comments.


On Fri, Oct 6, 2017 at 9:30 AM, Tyler Akidau



wrote:


+1

On Fri, Oct 6, 2017 at 8:54 AM Reuven Lax




wrote:


+1

On Oct 6, 2017 4:51 PM, "Lukasz Cwik" 

wrote:



I think its a great idea and find that the mergebot works well

on

the

website.
Since gitbox enforces that the precommit checks pass, it would

also

be

a

good forcing function for the community to maintain reliably

passing

tests.


On Fri, Oct 6, 2017 at 4:58 AM, Jean-Baptiste Onofré <

j...@nanthrax.net



wrote:


Hi guys,

We use Apache gitbox for the website and it works fine (as soon

as

you

linked your Apache and github with 2FA enabled).

What do you think about moving to gitbox for the codebase

itself

?


It could speed up the review and merge for the PRs.

Thoughts ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





















--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: CouchDbIO connector in beam io

2017-10-12 Thread Jean-Baptiste Onofré

I second Ismaël there: more we have IOs/connectors in Beam, better it is.

So, after checking if CouchbaseIO can help, we can move forward and create Jira 
and start a PR.


Regards
JB

On 10/12/2017 05:10 PM, Ismaël Mejía wrote:

This is an interesting one please go ahead and create the JIRA.

Maybe it is a good idea that you ping Seshadri and the guys who were
interested in implementing CouchbaseIO
https://issues.apache.org/jira/browse/BEAM-1893

I totally ignore if the APIs of CouchDb and Couchbase are similar but
if they do it would be nice to share some code. Also it would be nice
if the IO API is similar to other Document oriented stores like the
Mongo. We have similar data stores like Bigtable/HBase or
Elasticsearch/Solr sharing some of their 'style'.

Don't hesitate to ping me (us) or contact via slack if you have
questions or need some help.

Ismaël

On Thu, Oct 12, 2017 at 11:46 AM, tarush grover  wrote:

I does not have the potential use case but CouchDB has been used into real
time event storage and can be used for analytics so thought may be
providing connector of that will be helpful.

Not contacted Apache CouchDB team but now I am thinking, thanks for the
input!!

Regards,
Tarush

On Fri, Oct 6, 2017 at 3:08 AM, Chamikara Jayalath 
wrote:


CouchDB sounds interesting. Could you expand a bit more on potential
use-cases ? Also did you get any input from Apache CouchDB team ?

Thanks,
Cham

On Thu, Oct 5, 2017 at 12:49 AM tarush grover 
wrote:


Hi All,

I wanted to have inputs from community members regarding to have couchdb

io

connectors in our beam io module.

Regards,
Tarush





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: 100 open pull requests

2017-10-12 Thread Jean-Baptiste Onofré
Agree

I will start to update/review some PRs.

Regards
JB

On Oct 12, 2017, 21:26, at 21:26, Reuven Lax  wrote:
>In the past, Ahmet and I spent some time each week reviewing and
>pinging
>pull requests. This did not happen the past few weeks due to some
>vacations
>and travel. I do think pinging is effective for many of the PRs at
>least.
>
>On Thu, Oct 12, 2017 at 1:20 PM, Lukasz Cwik 
>wrote:
>
>> My experience is that it takes a good amount of time to review PRs
>and a
>> good portion of my time spent contributing to this project is by
>reviewing
>> PRs.
>> I currently have 3 out of 10 PRs that are older then 2 weeks so in my
>> experience pinging people to about progress has been pretty
>effective.
>> Out of those older PRs, 2 of those PRs I have heard back from the
>authors
>> and that they would attempt to get back to it soon.
>>
>> On Wed, Oct 11, 2017 at 8:09 PM, Kenneth Knowles 
>wrote:
>>
>> > Hi all,
>> >
>> > We have hit 100 open pull requests today*. It is an arbitrary
>number,
>> but a
>> > good excuse to note the upward trend. In part, I think it is simply
>> having
>> > more changes happening, which is cool. But it is also due to review
>> > latency. Sorting by "last updated" the first two pages range from
>6+
>> months
>> > to 16 days ago.
>> >
>> > We may, first of all, need a sweep to close stalled / no-go PRs.
>> >
>> > After that, having a triage process where someone drops in on PRs
>and
>> asks
>> > "any update?" has not been terrifically helpful in my experience
>(and
>> also
>> > obscures how stale PRs are) but is perhaps the most active measure
>we've
>> > taken in the past.
>> >
>> > Gitbox will probably make it easier to see who is requested to
>review a
>> PR
>> > and whether it is waiting on the reviewer or the author. That may
>help.
>> >
>> > Any other thoughts?
>> >
>> > Kenn
>> >
>> > *I'm part of the problem; 16 of them contain the phrase "R:
>@kennknowles"
>> > and I also have ~4 outgoing PRs that have stalled
>> >
>>


Re: 100 open pull requests

2017-10-13 Thread Jean-Baptiste Onofré

Agree, it makes sense only if the author doesn't reply after asking for an 
update.

Regards
JB

On 10/12/2017 11:32 PM, Ahmet Altay wrote:

We had a previous discussion about closing stale PRs. This might be a good
time to force close some of those, especially if the authors are not active.

Ahmet

On Thu, Oct 12, 2017 at 1:43 PM, Jean-Baptiste Onofré 
wrote:


Agree

I will start to update/review some PRs.

Regards
JB

On Oct 12, 2017, 21:26, at 21:26, Reuven Lax 
wrote:

In the past, Ahmet and I spent some time each week reviewing and
pinging
pull requests. This did not happen the past few weeks due to some
vacations
and travel. I do think pinging is effective for many of the PRs at
least.

On Thu, Oct 12, 2017 at 1:20 PM, Lukasz Cwik 
wrote:


My experience is that it takes a good amount of time to review PRs

and a

good portion of my time spent contributing to this project is by

reviewing

PRs.
I currently have 3 out of 10 PRs that are older then 2 weeks so in my
experience pinging people to about progress has been pretty

effective.

Out of those older PRs, 2 of those PRs I have heard back from the

authors

and that they would attempt to get back to it soon.

On Wed, Oct 11, 2017 at 8:09 PM, Kenneth Knowles 

wrote:



Hi all,

We have hit 100 open pull requests today*. It is an arbitrary

number,

but a

good excuse to note the upward trend. In part, I think it is simply

having

more changes happening, which is cool. But it is also due to review
latency. Sorting by "last updated" the first two pages range from

6+

months

to 16 days ago.

We may, first of all, need a sweep to close stalled / no-go PRs.

After that, having a triage process where someone drops in on PRs

and

asks

"any update?" has not been terrifically helpful in my experience

(and

also

obscures how stale PRs are) but is perhaps the most active measure

we've

taken in the past.

Gitbox will probably make it easier to see who is requested to

review a

PR

and whether it is waiting on the reviewer or the author. That may

help.


Any other thoughts?

Kenn

*I'm part of the problem; 16 of them contain the phrase "R:

@kennknowles"

and I also have ~4 outgoing PRs that have stalled









--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: JIRA component for IOs?

2017-10-13 Thread Jean-Baptiste Onofré
Hi

The component to use is actually java-sdk-extension.

Regards
JB

On Oct 13, 2017, 20:34, at 20:34, Valentyn Tymofieiev 
 wrote:
>With IOs being a significant part of BEAM codebase&development efforts,
>do
>we want to have a separate component in JIRA dedicated to IOs, perhaps
>broken down into subcomponents?
>
>I wanted to report an IO-related issue
> and the closest
>component
>appears to be sdk-java-extensions. Perhaps we want to have a place in
>Jira
>IOs can call home?


Re: MongoDbIO

2017-10-16 Thread Jean-Baptiste Onofré

Hi Chaim,

So, you mean you call MongoDBIO.write() with an empty PCollection (no element in 
the collection) ?


The write is basically a DoFn so, it won't do anything if the PCollection is 
empty.

Regards
JB

On 10/16/2017 01:59 PM, Chaim Turkel wrote:

Hi,
   In the case that there are no records to read, is there a way to get
called so that i can write the status?

chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: MongoDbIO

2017-10-16 Thread Jean-Baptiste Onofré

You can always add your own ParDo(DoFn) where you write the status.

Regards
JB

On 10/16/2017 04:24 PM, Chaim Turkel wrote:

that is the problem, i want to write a status that i tried and that
there were no records

On Mon, Oct 16, 2017 at 3:51 PM, Jean-Baptiste Onofré  wrote:

Hi Chaim,

So, you mean you call MongoDBIO.write() with an empty PCollection (no
element in the collection) ?

The write is basically a DoFn so, it won't do anything if the PCollection is
empty.

Regards
JB


On 10/16/2017 01:59 PM, Chaim Turkel wrote:


Hi,
In the case that there are no records to read, is there a way to get
called so that i can write the status?

chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-10-16 Thread Jean-Baptiste Onofré

+1 to vote on @user.

Not sure anyway it requires a formal vote. As Java 7 is deprecated, it should be 
an implicit "decision".


Regards
JB

On 10/16/2017 04:35 PM, Ismaël Mejía wrote:

Any progress on this? What is the proposed way to validate if users
are still interested on Java 7? A vote on user or something different?


On Wed, Sep 27, 2017 at 7:59 PM, Kenneth Knowles  
wrote:

Agree with polling Beam users as well as each runner's community in
aggregate.

On Wed, Sep 27, 2017 at 9:44 AM, Jean-Baptiste Onofré 
wrote:


Definitely agree


On 09/27/2017 06:00 PM, Robert Bradshaw wrote:


I also think that it's time to seriously consider dropping support for
Java 7.

On Tue, Sep 26, 2017 at 9:14 PM, Daniel Oliveira
 wrote:


Yes, just as Ismaël said it's a compilation blocker right now despite
that
(I believe) we don't use the extension that's breaking.

As for other ways to solve this, if there is a way to avoid compiling the
advanced features of AutoValue that might be worth a try. We could also
try
to get a release of AutoValue with the fix that works in Java 7. However
I
feel that slowly moving over to Java 8 is the most future-proof solution
if
it's possible.

On Tue, Sep 26, 2017 at 2:47 PM, Ismaël Mejía  wrote:

The current issue is that compilation fails on master because beam's

parent pom is configured to fail if it finds warnings):

  -Werror

However if you remove that line from the parent pom the compilation
passes.

Of course this does not mean that everything is solved for Java 9,
there are some tests that break and other issues because of other
plugins and dependencies (e.g. bytebuddy), but those are not part of
this discussion.

On Tue, Sep 26, 2017 at 11:38 PM, Eugene Kirpichov
 wrote:


AFAIK we don't use any advanced capabilities of AutoValue. Does that
mean
this issue is moot? I didn't quite understand from your email whether
it


is


a compilation blocker for Beam or not.

On Tue, Sep 26, 2017 at 2:32 PM Ismaël Mejía 
wrote:

Great that you are also working on this too Daniel and thanks for

bringing this subject to the mailing list, I was waiting to  my return
to office next week, but you did it first :)

Eugene for reference (This is the issue on the migration to Java 9),
notice that here the goal is first that beam passes mvn clean install
with pure Java 9 (and also add this to jenkins), not to rewrite
anything to use the new stuff (e.g. modules):
https://issues.apache.org/jira/browse/BEAM-2530

Eugene can you also PTAL at the AutoValue issue, more details on the
issue, this is a warning so I don't know if it is really critical in
particular because we are not using Memoization (do we?).
https://github.com/google/auto/issues/503

https://github.com/google/auto/commit/71514081f2ca6fb4ead2b7f0a25f5d


02247b8532





Wouldn't the easiest way be that you guys convince the google.auto
guys to generate that simple fix in a Java 7 compatible way and
'voila' ?

However I agree that moving to Java 8 is an excellent idea and as
Eugene mentions there is less friction now since most projects are
moving, only pending issue are existing clusters on java 7 in the
hadoop world, but those are less frequent now. Anyway this discussion
is really important (maybe even worth a vote). Because moving to Java
8 will allow us also to move some of the dependencies that we are
keeping in old versions and in general to move forward.

What do the others think ?



On Tue, Sep 26, 2017 at 11:09 PM, Eugene Kirpichov
 wrote:


Very excited to hear that there's work on JDK9 support - is there a


public


description of the plans for this work somewhere?

In general, Beam could probably drop Java7 support altogether at some


point


soon: Java7 has reached end-of-life (i.e. it's not receiving even


security


patches) 2 years ago, and all major players in the data processing
ecosystem have dropped Java7 support (Spark, Flink, Hadoop), so I


presume



the demand for Java7 support in the data processing industry is low.



By



the



way: would a Java8 migration be in the scope of your work in general?

However, until we say that Beam requires Java8, what would be the
implications of using a version of AutoValue that can only be
compiled


with


Java8? Are you saying that this is simply a matter of a compiler bug,


and



if we use a Java8 compiler but configured to use source and target



versions


of 1.7 and using bootclasspath of rt.jar from 1.7, then the resulting


Beam


artifacts will be usable by people who don't have Java8?

On Tue, Sep 26, 2017 at 1:53 PM Daniel Oliveira
 wrote:

So I've been working on JDK 9 support for Beam, and I have a bug in

AutoValue that can be fixed by updating our AutoValue dependency to


the



latest. The problem is that AutoValue from 1.5+ seems to be banned



for



Beam



due to requiring Java 8 compilers. However, it should still be



possible



to

Re: MongoDbIO

2017-10-16 Thread Jean-Baptiste Onofré

I didn't mean on the read, I meant between the read and write.

Basically, your pipeline could look like:

pipeline.apply(MongoDbIO.read()...)
.apply(ParDo.of(new DoFn() { // check PCollection and set the status }))
.apply(MongoDbIO.write()...)

Regards
JB

On 10/16/2017 09:42 PM, Chaim Turkel wrote:

how to i add a ParDo on the MongoDbIO.read() if there are no records read?

On Mon, Oct 16, 2017 at 6:53 PM, Jean-Baptiste Onofré  wrote:

You can always add your own ParDo(DoFn) where you write the status.

Regards
JB


On 10/16/2017 04:24 PM, Chaim Turkel wrote:


that is the problem, i want to write a status that i tried and that
there were no records

On Mon, Oct 16, 2017 at 3:51 PM, Jean-Baptiste Onofré 
wrote:


Hi Chaim,

So, you mean you call MongoDBIO.write() with an empty PCollection (no
element in the collection) ?

The write is basically a DoFn so, it won't do anything if the PCollection
is
empty.

Regards
JB


On 10/16/2017 01:59 PM, Chaim Turkel wrote:



Hi,
 In the case that there are no records to read, is there a way to get
called so that i can write the status?

chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: MongoDbIO

2017-10-17 Thread Jean-Baptiste Onofré
Good point but I was thinking more about a PTransform with Create.of() if 
there's no data.


Anyway, I can add a DoFn support in the write to update a status (not sure if it 
really makes sense).


Regards
JB

On 10/17/2017 01:40 PM, Chaim Turkel wrote:

but if there is no data then
.apply(ParDo.of(new DoFn() { // check PCollection and set the status }))

will not be called

On Tue, Oct 17, 2017 at 8:33 AM, Jean-Baptiste Onofré  wrote:

I didn't mean on the read, I meant between the read and write.

Basically, your pipeline could look like:

pipeline.apply(MongoDbIO.read()...)
 .apply(ParDo.of(new DoFn() { // check PCollection and set the status
}))
 .apply(MongoDbIO.write()...)

Regards
JB


On 10/16/2017 09:42 PM, Chaim Turkel wrote:


how to i add a ParDo on the MongoDbIO.read() if there are no records read?

On Mon, Oct 16, 2017 at 6:53 PM, Jean-Baptiste Onofré 
wrote:


You can always add your own ParDo(DoFn) where you write the status.

Regards
JB


On 10/16/2017 04:24 PM, Chaim Turkel wrote:



that is the problem, i want to write a status that i tried and that
there were no records

On Mon, Oct 16, 2017 at 3:51 PM, Jean-Baptiste Onofré 
wrote:



Hi Chaim,

So, you mean you call MongoDBIO.write() with an empty PCollection (no
element in the collection) ?

The write is basically a DoFn so, it won't do anything if the
PCollection
is
empty.

Regards
JB


On 10/16/2017 01:59 PM, Chaim Turkel wrote:




Hi,
  In the case that there are no records to read, is there a way to
get
called so that i can write the status?

chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: MongoDbIO

2017-10-17 Thread Jean-Baptiste Onofré

Let me take a quick look and create a Jira if required.

Thanks for the idea !

Regards
JB

On 10/17/2017 01:54 PM, Chaim Turkel wrote:

I am fine if you can show me how to do the Create.of() on the MongoDbIO.read().
It would be nice also to have the status also on the MongoDbIO.write.

again this is the reactive streams pattern that there always is a
complete or error path

chaim

On Tue, Oct 17, 2017 at 2:51 PM, Jean-Baptiste Onofré  wrote:

Good point but I was thinking more about a PTransform with Create.of() if
there's no data.

Anyway, I can add a DoFn support in the write to update a status (not sure
if it really makes sense).

Regards
JB


On 10/17/2017 01:40 PM, Chaim Turkel wrote:


but if there is no data then
.apply(ParDo.of(new DoFn() { // check PCollection and set the status }))

will not be called

On Tue, Oct 17, 2017 at 8:33 AM, Jean-Baptiste Onofré 
wrote:


I didn't mean on the read, I meant between the read and write.

Basically, your pipeline could look like:

pipeline.apply(MongoDbIO.read()...)
  .apply(ParDo.of(new DoFn() { // check PCollection and set the
status
}))
  .apply(MongoDbIO.write()...)

Regards
JB


On 10/16/2017 09:42 PM, Chaim Turkel wrote:



how to i add a ParDo on the MongoDbIO.read() if there are no records
read?

On Mon, Oct 16, 2017 at 6:53 PM, Jean-Baptiste Onofré 
wrote:



You can always add your own ParDo(DoFn) where you write the status.

Regards
JB


On 10/16/2017 04:24 PM, Chaim Turkel wrote:




that is the problem, i want to write a status that i tried and that
there were no records

On Mon, Oct 16, 2017 at 3:51 PM, Jean-Baptiste Onofré

wrote:




Hi Chaim,

So, you mean you call MongoDBIO.write() with an empty PCollection (no
element in the collection) ?

The write is basically a DoFn so, it won't do anything if the
PCollection
is
empty.

Regards
JB


On 10/16/2017 01:59 PM, Chaim Turkel wrote:





Hi,
   In the case that there are no records to read, is there a way
to
get
called so that i can write the status?

chaim



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-10-17 Thread Jean-Baptiste Onofré

Agree, I would target this for Beam 3.0.0.

Regards
JB

On 10/17/2017 06:43 PM, Reuven Lax wrote:

Should this be considered a backwards-incompatible change? If so, do we
need to wait until Beam 3.0.0 is released?

On Tue, Oct 17, 2017 at 9:11 AM, Ismaël Mejía  wrote:


I am bringing the subject to the user mailing list to get some
feedback because it makes sense anyway to discuss this there. But I
also agree with Kenneth about the fact that runner authors must weight
in on this subject.


On Tue, Oct 17, 2017 at 5:24 PM, Kenneth Knowles 
wrote:

+1 to having runner maintainers weigh in as proxies. Added a few in case
they haven't followed this thread.

On Mon, Oct 16, 2017 at 11:38 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Agreed that polling Dataflow users makes sense, though I think they are
very unlikely to have concerns, because unlike Spark/Flink users they
wouldn't have a "cluster" that they need to migrate to a new JVM: they'd
only need to recompile their pipelines with JDK 8.

On Mon, Oct 16, 2017 at 11:21 PM Reuven Lax 
wrote:


I think the Flink and Spark runner maintainers can weigh in here;

given

that both of those systems are moving to Java 8, I doubt they will

have

concerns. Same is true for the Dataflow runner - we should probably

poll

Dataflow users to make sure this is not a problem for any of them.

On Mon, Oct 16, 2017 at 11:05 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Reuven - do you mean e.g. a poll on the Flink mailing list asking

whether

there are Flink users who use Beam with Java 7? Or just people who

use

Flink with Java 7? (the latter question I'd assume was settled by

the

poll

about making Flink itself Java8-only?)

On Mon, Oct 16, 2017 at 10:32 PM Reuven Lax



wrote:


I don't know if a vote in @user is sufficient, as it's

unfortunately

not

representative of all Beam users. I think the runner communities

should

be

polled as well (though I suspect the answer will be the same,

that we

can

go ahead and deprecate Java 7 support).

On Mon, Oct 16, 2017 at 4:50 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:


Yeah, a vote on user@ sounds like a good idea. Ismaël, would

you

be

interested in driving this process, since you're already

working on

Java9

support and hence you have a good understanding of what's

involved

in a

JDK

version migration for a large project?

As due diligence, we can look at how the other data processing

systems

handled dropping Java7.

Flink:
JIRA https://issues.apache.org/jira/browse/FLINK-7242
Discussion
http://apache-flink-user-mailing-list-archive.2336050.
n4.nabble.com/POLL-Who-still-uses-Java-7-with-Flink-

td12216.html


They also did a Twitter poll in addition to the mailing list

poll,

which

seems like a good idea.
Note that Flink had a number of strong reasons for dropping

Java7

that

do

not apply in Beam.

Spark:
JIRA https://issues.apache.org/jira/browse/SPARK-19493
Discussion


http://apache-spark-developers-list.1001551.n3.

nabble.com/discuss-ending-

support-for-Java-7-in-Spark-2-0-td16808.html
(I
couldn't find a formal poll of the user list rather than

developer

list)


Hadoop:
Hadoop 3.0 is Java8-only, but I couldn't quickly find a

discussion

of

where

that decision was made.
https://lists.apache.org/thread.html/e5c8085ada2cca47027b63f

5439839

731a392335770386e10895be06@1444091751@%3Cmapreduce-dev.
hadoop.apache.org%3E
might
be it.

Kafka is considering it:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
and
quotes a number of other open-source projects that have switched
http://markmail.org/message/l7s276y3xkga2eqf

So basically these projects all did a mailing list poll, and one

did

also a

twitter poll.

Beam has the advantage of being a relatively young project with

perhaps a

smaller base of users entrenched in using old versions of Java;

moreover,

Java version would matter only for the smaller subset of users

who

use

Beam

Spark/Flink/Apex/.. runners (as opposed to Cloud Dataflow),

which

is

likely

an even more "early adopter"-ish group of users, as these

runners

generally

receive less support.

It may be a good idea to have at least 1 release pass between

announcing

the intention to drop Java8 and actually dropping it (e.g. if we

decided

it

now, then 2.4 would drop Java7). Also, we could start by

switching

tests

to

compile/run with java8 (Maven allows this). This is, I think,

pretty

much

safe to do immediately.

On Mon, Oct 16, 2017 at 7:35 AM Ismaël Mejía 


wrote:



Any progress on this? What is the proposed way to validate if

users

are still interested on Java 7? A vote on user or something

different?



On Wed, Sep 27, 2017 at 7:59 PM, Kenneth Knowles



wrote:

Agree with polling Beam users as well as each runner's

community

in

aggregate.

On Wed, Sep 27, 20

Re: [Proposal] Beam website navigation update

2017-10-19 Thread Jean-Baptiste Onofré

+1

It looks great.

Regards
JB

On 10/16/2017 11:23 PM, Melissa Pashniak wrote:

Hello Beam folks,

I've received some good feedback that the Beam website can be difficult to
navigate, due to such things as:

- The top pulldown menus are too long and you can't select things at the
end, and this will only get worse as we create more pages
- The programming guide as a single page is too long and difficult to
navigate
- Navigation issues on mobile devices

In order to tackle these issues, we (myself and David Perez) have put
together a staged proposal [1] for an improved navigation story. It
includes:

- A new left nav that contains the lists of items previously in the top
pulldown. It's easy to change what items are displayed, you can nest
arbitrary items, etc. The left nav is scrollable on small windows, so you
can always reach everything. Users can now jump between programming guide
sections easily.
- On the right is a list of the sections within a page, so you can quickly
jump to where you want. The list is autogenerated from the sections in the
page.
- Mobile improvements such as: you can now get to the top nav items from
the home page.

This PR [2] would just be the first step. Once merged, it opens up other
improvements:

- Break apart the major sections of the programming guide into separate
pages. This allows for the addition of more details and code samples for
each section, without making an already-too-large page even larger. With
the left nav, jumping between these pages will be easy.
- A proper overview landing page for the SDKs section, that lists the
available SDKs, any differences, etc.
- Tweaking left nav items to bubble up previously hard-to-find content (for
example, the Python type safety and Python dependencies pages)

Please check it out and see what you think -- we'd love to get your
feedback.

Thanks!

[1] http://apache-beam-website-pull-requests.storage.
googleapis.com/332/index.html
[2] https://github.com/apache/beam-site/pull/332



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [Proposal] Sharing Neville's post and upcoming meetups in the Twitter handle

2017-10-20 Thread Jean-Baptiste Onofré

+1 for Neville's post.

And no problem to promote the coming meetups.

Regards
JB

On 10/20/2017 10:08 PM, Griselda Cuevas wrote:

Hi everyone - What do you think about sharing Neville's blogpost[1] about
the road to Scio on the Apache Beam Twitter account?, I think it'd be good
to share some content since the last time we were active as 9/27.

Also - could you help promote some of the upcoming Meetups? I made the
following tweets:

11/1 - San Francisco Cloud Mafia
Tweet:
Come join the SF Cloud Mafia to learn about stream & batch processing with
#ApacheBeam on Nov. 1st. https://www.meetup.com/San-
Francisco-Cloud-Mafia/events/244180581/

11/22 - StockholmApache Beam Meetup
Tweet:
Stockholm is ready for its first #ApacheBeam meetup on Nov. 22nd. Join if
you're around! https://www.meetup.com/Apache-Beam-Stockholm/

[1] https://labs.spotify.com/2017/10/16/big-data-processing-at-
spotify-the-road-to-scio-part-1/

Thanks!
G



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Migrate to gitbox

2017-10-22 Thread Jean-Baptiste Onofré

+1 (binding)

Regards
JB

On 10/10/2017 09:42 AM, Jean-Baptiste Onofré wrote:

Hi all,

following the discussion, here's the formal vote to migrate to gitbox:

[ ] +1, Approve to migrate to gitbox
[ ] -1, Do not migrate (please provide specific comments)

The vote will be open for at least 36 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


[RESULT][VOTE] Migrate to gitbox

2017-10-22 Thread Jean-Baptiste Onofré

Hi all,

this vote passed with only +1.

I will requuest INFRA to move the repositories to gitbox.

Thanks all for your vote !

Regards
JB

On 10/10/2017 09:42 AM, Jean-Baptiste Onofré wrote:

Hi all,

following the discussion, here's the formal vote to migrate to gitbox:

[ ] +1, Approve to migrate to gitbox
[ ] -1, Do not migrate (please provide specific comments)

The vote will be open for at least 36 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Update on 2.2.0 release

2017-10-23 Thread Jean-Baptiste Onofré

Hi guys,

can we move forward on 2.2.0 release ?

Only two Jira are still open for 2.2.0:

BEAM-3011 - Pin Runner harness container image in Python SDK
BEAM-2271 - Release guide or pom.xml needs update to avoid releasing Python 
binary artifacts


Both are related to Python SDK.
Can we have an update about this ?
Are they blocker for the 2.2.0 release ?

@Reuven: I think we are good to do a 2.2.0.RC1. Thoughts ?

Thanks !
Regards
JB

On 10/10/2017 12:53 AM, Reuven Lax wrote:

I've now cut a release branch for 2.2.0.

On Mon, Oct 9, 2017 at 2:01 AM, Jean-Baptiste Onofré 
wrote:


It sounds good to me.

Regards
JB


On 10/09/2017 12:38 AM, Reuven Lax wrote:


At this point it's taken over three weeks to resolve release-blocking
issues, and there are still some remaining. Monday morning I will go ahead
and cut an initial release candidate, so we get a snapshot to base on. The
remaining blocking issues will either be cherry picked into an additional
release candidate or punted to 2.3.0, depending on severity.

Reuven

On Wed, Sep 20, 2017 at 5:34 AM, Jean-Baptiste Onofré 
wrote:

Hi Reuven,


thanks for the update. I will pick up some Jira in my bucket (to
verify/double check or implement).

Regards
JB


On 09/20/2017 04:23 AM, Reuven Lax wrote:

There are 7 issues left on

https://issues.apache.org/jira/projects/BEAM/versions/12341044 after I
moved some items to 2.3.0 if they appeared to not need to be in 2.2.0
(some
were originally slated for 2.1.0, and there was no apparent progress on
them). Please object if you feel that it's important that any of these
be
in 2.2.0

Of the remaining issues, three (BEAM-2956, BEAM-2298, BEAM-2271) appear
to
already be fixed; please verify this and close the issue if you are the
reporter. The remaining issues (BEAM-2834, BEAM-2858, BEAM-2870,
BEAM-29540) all have outstanding pull requests. As soon as all of these
pull requests are merged, I will start the release cut.

Reuven


--

Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Is mergebot down?

2017-10-25 Thread Jean-Baptiste Onofré

Hi,

I have some issue with mergebot as well.

By the way, I switched Apache Karaf to gitbox and we have a "merge pull request" 
button. Pretty convenient ;)


Regards
JB

On 10/26/2017 12:14 AM, Eugene Kirpichov wrote:

Hello,

On this PR
https://github.com/apache/beam-site/pull/334#issuecomment-338551253 the
mergebot seems to not activate by "@asfgit merge". Jason, could you take a
look?



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


JB offline for 5 days

2017-10-25 Thread Jean-Baptiste Onofré

Hi guys,

Just to let you know that I will be offline for the next 5 days (vacation).

See you end of next week !

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


  1   2   3   4   5   6   7   8   9   10   >