date:20170315

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jean-Baptiste Onofré


By the way, this step is in the "Release Guide".

Bu you are right, it means the release manager needs "permission" on  the Jira 
or ask to change the version state.


Regards
JB

On 03/16/2017 02:42 AM, Ahmet Altay wrote:

JB,

0.6.0 is flagged as released now, thank you for catching this. As a side
note, I did not have enough permissions do this and asked Davor to do. I
will add this to the release notes.

Ahmet

On Wed, Mar 15, 2017 at 7:16 AM, Jesse Anderson 
wrote:


Excellent!

On Wed, Mar 15, 2017, 6:13 AM Jean-Baptiste Onofré 
wrote:


Hi Ahmet,

it seems Jira is not up to date: 0.6.0 version is not flagged as
"Released".

Can you fix that please ?

Thanks !
Regards
JB

On 03/15/2017 05:22 AM, Ahmet Altay wrote:

I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 4 of which are binding:
* Aljoscha Krettek
* Davor Bonaci
* Ismaël Mejía
* Jean-Baptiste Onofré
* Robert Bradshaw
* Ted Yu
* Tibor Kiss

There are no disapproving votes.

Thanks everyone!

Ahmet



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Thanks,

Jesse





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Beam spark 2.x runner status

2017-03-15 Thread Jean-Baptiste Onofré


Hi guys,

sorry, due to the time zone shift, I answer a bit late ;)

I think we can have the same runner dealing with the two major Spark version, 
introducing some adapters. For instance, in CarbonData, we created some adapters 
to work with Spark 1?5, Spark 1.6 and Spark 2.1. The dependencies come from 
Maven profiles. Of course, it's easier there as it's more "user" code.


My proposal is just it's worth to try ;)

I just created a branch to experiment a bit and have more details.

Regards
JB

On 03/16/2017 02:31 AM, Amit Sela wrote:

I answered inline to Abbass' comment, but I think he hit something - how
about we have a branch with those adaptations ? same RDD implementation,
but depending on the latest 2.x version with the minimal changes required.
I'd be happy to do that, or guide anyone who wants to (I did most of it on
my branch for Spark 2 anyway) but since it's a branch and not on master (I
don't believe it "deserves" a place on master), it would always be a bit
behind since we would have to rebase and merge once in a while.

How does that sound ?

On Wed, Mar 15, 2017 at 7:49 PM amarouni  wrote:


+1 for Spark runners based on different APIs RDD/Dataset and keeping the
Spark versions as a deployment dependency.

The RDD API is stable & mature enough so it makes sense to have it on
master, the Dataset API still have some work to do and from our own
experience it just reached a comparable RDD API performance. The
community is clearly heading in the Dataset API direction but the RDD
API is still a viable option for most use cases.

Just one quick question, today on master can we swap Spark 1.x by Spark
2.x  and compile and use the Spark Runner ?


Good question!
I think this is the root cause of this problem - Spark 2 not only
introduced a new API, but also broke a few such as: context is now session,
Accumulators are AccumulatorV2, and this is what I recall right now.
I don't think it's to hard to adapt those, and anyone who wants to could
see how I did it on my branch:
https://github.com/amitsela/beam/commit/8a1cf889d14d2b47e9e35bae742d78a290cbbdc9




Thanks,

Abbass,


On 15/03/2017 17:57, Amit Sela wrote:

So you're suggesting we copy-paste the current runner and adapt whatever

is

necessary so it runs with Spark 2 ?
This also means any bug-fix / improvement would have to be maintained in
two runners, and I wouldn't wanna do that.

I don't like to think in terms of Spark1/2 but in terms of RDD/Dataset

API.

Since the RDD API is mature, it should be the runner in master (not
preventing another runner once Dataset API is mature enough) and the
version (1.6.3 or 2.x) should be determined by the common installation.

That's why I believe we still need to leave things as they are, but start
working on the Dataset API runner.
Otherwise, we'll have the current runner, another RDD API runner with

Spark

2, and a third one for the Dataset API. I don't want to maintain all of
them. It's a mess.

On Wed, Mar 15, 2017 at 6:39 PM Ismaël Mejía  wrote:


However, I do feel that we should use the Dataset API, starting with

batch

support first. WDYT ?

Well, this is the exact current status quo, and it will take us some
time to have something as complete as what we have with the spark 1
runner for the spark 2.

The other proposal has two advantages:

One is that we can leverage on the existing implementation (with the
needed adjustments) to run Beam pipelines on Spark 2, in the end final
users don’t care so much if pipelines are translated via RDD/DStream
or Dataset, they just want to know that with Beam they can run their
code in their favorite data processing framework.

The other advantage is that we can base the work on the latest spark
version and advance simultaneously in translators for both APIs, and
once we consider that the DataSet is mature enough we can stop
maintaining the RDD one and make it the official one.

The only missing piece is backporting new developments on the RDD
based translator from the spark 2 version into the spark 1, but maybe
this won’t be so hard if we consider what you said, that at this point
we are getting closer to have streaming right (of course you are the
most appropriate person to decide if we are in a sufficient good shape
to make this, so backporting things won’t be so hard).

Finally I agree with you, I would prefer a nice and full featured
translator based on the Structured Streaming API but the question is
how much time this will take to be in shape and the impact on final
users who are already requesting this. This is the reason why I think
the more conservative approach (keeping around the RDD translator) and
moving incrementally makes sense.

On Wed, Mar 15, 2017 at 4:52 PM, Amit Sela 

wrote:

I feel that as we're getting closer to supporting streaming with Spark

1

runner, and having Structured Streaming advance in Spark 2, we could

start

work on Spark 2 runner in a separate branch.

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jean-Baptiste Onofré


Thanks !

Regards
JB

On 03/16/2017 02:42 AM, Ahmet Altay wrote:

JB,

0.6.0 is flagged as released now, thank you for catching this. As a side
note, I did not have enough permissions do this and asked Davor to do. I
will add this to the release notes.

Ahmet

On Wed, Mar 15, 2017 at 7:16 AM, Jesse Anderson 
wrote:


Excellent!

On Wed, Mar 15, 2017, 6:13 AM Jean-Baptiste Onofré 
wrote:


Hi Ahmet,

it seems Jira is not up to date: 0.6.0 version is not flagged as
"Released".

Can you fix that please ?

Thanks !
Regards
JB

On 03/15/2017 05:22 AM, Ahmet Altay wrote:

I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 4 of which are binding:
* Aljoscha Krettek
* Davor Bonaci
* Ismaël Mejía
* Jean-Baptiste Onofré
* Robert Bradshaw
* Ted Yu
* Tibor Kiss

There are no disapproving votes.

Thanks everyone!

Ahmet



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Thanks,

Jesse





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Docker image dependencies

2017-03-15 Thread Stephen Sisk

thanks for the discussion! In general, I agree with the sentiments
expressed here. I updated
https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.hlirex1vus1a
to
reflect this discussion. (The plan is still that I will put that on the
website.)

Apache Docker Repository - are you talking about
https://hub.docker.com/u/apache/ ? If not, can you point me at more info? I
can't seem to find info about this on the publicly visible apache-infra
mailing lists thatI could find, and the apache infra website doesn't seem
to mention a docker repository.

> However the current Beam Elasticsearch IO does not support Elasticsearch
5, and elastic does not have an image for version 2, so in this particular case
following the priority order we should use the official docker image (2)
for the tests (assuming that both require the same version).  Do you agree
with this ?

Yup, that makes sense to me.

> How do we deal with IOs that require more than one base image, this is a  
> common
scenario for projects that depend on Zookeeper?

Is there a reason not to just run a kubernetes ReplicaController+Service
for these cases? k8 can easily support having a hostname that pods can rely
on having the zookeeper instance. It also uses text config - see
https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes,
and sets up the connections/nameservice between the hosts - if other tests
wanted to rely on postgres, it could just connect to host "postgres" and
postgres is there.

Basically - I'm trying to keep number of tools at a minimum while still
providing good support for the functionality we need. Does docker-compose
provide something beyond the functionality that k8 does? I'm not familiar
with docker-compose, but looking at
https://docs.docker.com/compose/overview/#compose-documentation it doesn't
seem to provide anything that k8 doesn't already.

S

On Wed, Mar 15, 2017 at 7:10 AM Ismaël Mejía  wrote:

Hi, Thanks for bringing this subject to the mailing list.

+1
We definitely need a consensus on this, and I agree with your proposal and
JB’s comments modulo certain clarifications:

I think we shall go in this priority order if the version of the image we
want is available:

1. Image provided by the creator of the data source/sink (if they
officially maintain it). (This is the case of Elasticsearch for example) or
the Apache projects (if they provide one) as JB mentions.
2. Official docker images (because they have security fixes and have
guaranteed maintenance.
3. Non-official docker images or images from other providers that have good
maintainers e.g. quay.io

It makes sense to use the same image for all the tests. and to use the
fixed versions supported by the respective IO to avoid possible issues
during testing between different versions/naming of env variables, etc.

The Elasticsearch case is a 'good' example because it shows all the current
issues:

We should not use one elasticsearch image (elk) for some tests and a
different one for the other (the quay one), and if we resolve by priority
we would take the image provided by the creator (1) for both cases.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
However the current Beam Elasticsearch IO does not support Elasticsearch 5,
and elastic does not have an image for version 2, so in this particular
case following the priority order we should use the official docker image
(2) for the tests (assuming that both require the same version).
 Do you agree with this ?

Thinking about the ELK image I came with a new question. How do we deal
with IOs that require more than one base image, this is a common scenario
for projects that depend on Zookeeper? e.g. Kafka/Solr.  Usually people
coordinate those with a docker-compose file that creates an artificial
network to connect the Zookeeper image and the Kafka/Solr one
 just executing the 'docker-compose up' command
. Will we adopt this for such cases ?

I know that Kubernetes does this too, but the docker-compose format is
quite easy and textual,
and it is usually ready with the docker installation, additionally the
docker-compose files can easily be translated with kompose into Kubernetes
resources.

Ismaël

On Wed, Mar 15, 2017 at 3:17 AM, Jean-Baptiste Onofré 
wrote:

> Hi Stephen,
>
> 1. About the docker repositories, we now have official Docker repo at
> Apache. So, for the Apache projects, I would recommend the Apache official
> repo. Anyway, generally speaking, I would recommend the official repo
(from
> the projects).
>
> 2. To avoid "unpredictable" breaking change, I would pin to a particular
> versions, and explicitly update if needed.
>
> 3. It's better that docker images are under an unique responsibility scope
> as different IOs can use the same resources, so they should use the same
> provided docker.
>
> By the way, I also have a docker coming for RedisIO ;)
>
> Regards
>

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ahmet Altay

JB,

0.6.0 is flagged as released now, thank you for catching this. As a side
note, I did not have enough permissions do this and asked Davor to do. I
will add this to the release notes.

Ahmet

On Wed, Mar 15, 2017 at 7:16 AM, Jesse Anderson 
wrote:

> Excellent!
>
> On Wed, Mar 15, 2017, 6:13 AM Jean-Baptiste Onofré 
> wrote:
>
> > Hi Ahmet,
> >
> > it seems Jira is not up to date: 0.6.0 version is not flagged as
> > "Released".
> >
> > Can you fix that please ?
> >
> > Thanks !
> > Regards
> > JB
> >
> > On 03/15/2017 05:22 AM, Ahmet Altay wrote:
> > > I'm happy to announce that we have unanimously approved this release.
> > >
> > > There are 7 approving votes, 4 of which are binding:
> > > * Aljoscha Krettek
> > > * Davor Bonaci
> > > * Ismaël Mejía
> > > * Jean-Baptiste Onofré
> > > * Robert Bradshaw
> > > * Ted Yu
> > > * Tibor Kiss
> > >
> > > There are no disapproving votes.
> > >
> > > Thanks everyone!
> > >
> > > Ahmet
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
> --
> Thanks,
>
> Jesse
>

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela

I answered inline to Abbass' comment, but I think he hit something - how
about we have a branch with those adaptations ? same RDD implementation,
but depending on the latest 2.x version with the minimal changes required.
I'd be happy to do that, or guide anyone who wants to (I did most of it on
my branch for Spark 2 anyway) but since it's a branch and not on master (I
don't believe it "deserves" a place on master), it would always be a bit
behind since we would have to rebase and merge once in a while.

How does that sound ?

On Wed, Mar 15, 2017 at 7:49 PM amarouni  wrote:

> +1 for Spark runners based on different APIs RDD/Dataset and keeping the
> Spark versions as a deployment dependency.
>
> The RDD API is stable & mature enough so it makes sense to have it on
> master, the Dataset API still have some work to do and from our own
> experience it just reached a comparable RDD API performance. The
> community is clearly heading in the Dataset API direction but the RDD
> API is still a viable option for most use cases.
>
> Just one quick question, today on master can we swap Spark 1.x by Spark
> 2.x  and compile and use the Spark Runner ?
>
Good question!
I think this is the root cause of this problem - Spark 2 not only
introduced a new API, but also broke a few such as: context is now session,
Accumulators are AccumulatorV2, and this is what I recall right now.
I don't think it's to hard to adapt those, and anyone who wants to could
see how I did it on my branch:
https://github.com/amitsela/beam/commit/8a1cf889d14d2b47e9e35bae742d78a290cbbdc9


>
> Thanks,
>
> Abbass,
>
>
> On 15/03/2017 17:57, Amit Sela wrote:
> > So you're suggesting we copy-paste the current runner and adapt whatever
> is
> > necessary so it runs with Spark 2 ?
> > This also means any bug-fix / improvement would have to be maintained in
> > two runners, and I wouldn't wanna do that.
> >
> > I don't like to think in terms of Spark1/2 but in terms of RDD/Dataset
> API.
> > Since the RDD API is mature, it should be the runner in master (not
> > preventing another runner once Dataset API is mature enough) and the
> > version (1.6.3 or 2.x) should be determined by the common installation.
> >
> > That's why I believe we still need to leave things as they are, but start
> > working on the Dataset API runner.
> > Otherwise, we'll have the current runner, another RDD API runner with
> Spark
> > 2, and a third one for the Dataset API. I don't want to maintain all of
> > them. It's a mess.
> >
> > On Wed, Mar 15, 2017 at 6:39 PM Ismaël Mejía  wrote:
> >
> >>> However, I do feel that we should use the Dataset API, starting with
> >> batch
> >>> support first. WDYT ?
> >> Well, this is the exact current status quo, and it will take us some
> >> time to have something as complete as what we have with the spark 1
> >> runner for the spark 2.
> >>
> >> The other proposal has two advantages:
> >>
> >> One is that we can leverage on the existing implementation (with the
> >> needed adjustments) to run Beam pipelines on Spark 2, in the end final
> >> users don’t care so much if pipelines are translated via RDD/DStream
> >> or Dataset, they just want to know that with Beam they can run their
> >> code in their favorite data processing framework.
> >>
> >> The other advantage is that we can base the work on the latest spark
> >> version and advance simultaneously in translators for both APIs, and
> >> once we consider that the DataSet is mature enough we can stop
> >> maintaining the RDD one and make it the official one.
> >>
> >> The only missing piece is backporting new developments on the RDD
> >> based translator from the spark 2 version into the spark 1, but maybe
> >> this won’t be so hard if we consider what you said, that at this point
> >> we are getting closer to have streaming right (of course you are the
> >> most appropriate person to decide if we are in a sufficient good shape
> >> to make this, so backporting things won’t be so hard).
> >>
> >> Finally I agree with you, I would prefer a nice and full featured
> >> translator based on the Structured Streaming API but the question is
> >> how much time this will take to be in shape and the impact on final
> >> users who are already requesting this. This is the reason why I think
> >> the more conservative approach (keeping around the RDD translator) and
> >> moving incrementally makes sense.
> >>
> >> On Wed, Mar 15, 2017 at 4:52 PM, Amit Sela 
> wrote:
> >>> I feel that as we're getting closer to supporting streaming with Spark
> 1
> >>> runner, and having Structured Streaming advance in Spark 2, we could
> >> start
> >>> work on Spark 2 runner in a separate branch.
> >>>
> >>> However, I do feel that we should use the Dataset API, starting with
> >> batch
> >>> support first. WDYT ?
> >>>
> >>> On Wed, Mar 15, 2017 at 5:47 PM Ismaël Mejía 
> wrote:
> >>>
> > So you propose to have the Spark 2 branch a

Re: Beam spark 2.x runner status

2017-03-15 Thread amarouni

+1 for Spark runners based on different APIs RDD/Dataset and keeping the
Spark versions as a deployment dependency.

The RDD API is stable & mature enough so it makes sense to have it on
master, the Dataset API still have some work to do and from our own
experience it just reached a comparable RDD API performance. The
community is clearly heading in the Dataset API direction but the RDD
API is still a viable option for most use cases.

Just one quick question, today on master can we swap Spark 1.x by Spark
2.x  and compile and use the Spark Runner ?

Thanks,

Abbass,


On 15/03/2017 17:57, Amit Sela wrote:
> So you're suggesting we copy-paste the current runner and adapt whatever is
> necessary so it runs with Spark 2 ?
> This also means any bug-fix / improvement would have to be maintained in
> two runners, and I wouldn't wanna do that.
>
> I don't like to think in terms of Spark1/2 but in terms of RDD/Dataset API.
> Since the RDD API is mature, it should be the runner in master (not
> preventing another runner once Dataset API is mature enough) and the
> version (1.6.3 or 2.x) should be determined by the common installation.
>
> That's why I believe we still need to leave things as they are, but start
> working on the Dataset API runner.
> Otherwise, we'll have the current runner, another RDD API runner with Spark
> 2, and a third one for the Dataset API. I don't want to maintain all of
> them. It's a mess.
>
> On Wed, Mar 15, 2017 at 6:39 PM Ismaël Mejía  wrote:
>
>>> However, I do feel that we should use the Dataset API, starting with
>> batch
>>> support first. WDYT ?
>> Well, this is the exact current status quo, and it will take us some
>> time to have something as complete as what we have with the spark 1
>> runner for the spark 2.
>>
>> The other proposal has two advantages:
>>
>> One is that we can leverage on the existing implementation (with the
>> needed adjustments) to run Beam pipelines on Spark 2, in the end final
>> users don’t care so much if pipelines are translated via RDD/DStream
>> or Dataset, they just want to know that with Beam they can run their
>> code in their favorite data processing framework.
>>
>> The other advantage is that we can base the work on the latest spark
>> version and advance simultaneously in translators for both APIs, and
>> once we consider that the DataSet is mature enough we can stop
>> maintaining the RDD one and make it the official one.
>>
>> The only missing piece is backporting new developments on the RDD
>> based translator from the spark 2 version into the spark 1, but maybe
>> this won’t be so hard if we consider what you said, that at this point
>> we are getting closer to have streaming right (of course you are the
>> most appropriate person to decide if we are in a sufficient good shape
>> to make this, so backporting things won’t be so hard).
>>
>> Finally I agree with you, I would prefer a nice and full featured
>> translator based on the Structured Streaming API but the question is
>> how much time this will take to be in shape and the impact on final
>> users who are already requesting this. This is the reason why I think
>> the more conservative approach (keeping around the RDD translator) and
>> moving incrementally makes sense.
>>
>> On Wed, Mar 15, 2017 at 4:52 PM, Amit Sela  wrote:
>>> I feel that as we're getting closer to supporting streaming with Spark 1
>>> runner, and having Structured Streaming advance in Spark 2, we could
>> start
>>> work on Spark 2 runner in a separate branch.
>>>
>>> However, I do feel that we should use the Dataset API, starting with
>> batch
>>> support first. WDYT ?
>>>
>>> On Wed, Mar 15, 2017 at 5:47 PM Ismaël Mejía  wrote:
>>>
> So you propose to have the Spark 2 branch a clone of the current one
>> with
> adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
 while
> still using the RDD API ?
 Yes this is exactly what I have in mind.

> I think that having another Spark runner is great if it has value,
> otherwise, let's just bump the version.
 There is value because most people are already starting to move to
 spark 2 and all Big Data distribution providers support it now, as
 well as the Cloud-based distributions (Dataproc and EMR) not like the
 last time we had this discussion.

> We could think of starting to migrate the Spark 1 runner to Spark 2
>> and
> follow with Dataset API support feature-by-feature as ot advances,
>> but I
> think most Spark installations today still run 1.X, or am I wrong ?
 No, you are right, that’s why I didn’t even mentioned removing the
 spark 1 runner, I know that having to support things for both versions
 can add additional work for us, but maybe the best approach would be
 to continue the work only in the spark 2 runner (both refining the RDD
 based translator and starting to create the Dataset one there that

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Amit Sela

Great! so we'll use the hangout you added here, see you then.

On Wed, Mar 15, 2017 at 7:22 PM Eugene Kirpichov
 wrote:

> Amit - 8am is fine with me, let's do that.
>
> On Wed, Mar 15, 2017 at 6:00 AM Jean-Baptiste Onofré 
> wrote:
>
> > Hi,
> >
> > Anyway, I hope it will result with some notes on the mailing list as it
> > could be
> > helpful.
> >
> > I'm not against a video call to move forward, but, from ma community
> > perspective,  we should always provide minute notes on the mailing list.
> >
> > Unfortunately, next Friday, I will still be in China, so not possible to
> > join
> > (even if I would have like to participate :().
> >
> > Regards
> > JB
> >
> > On 03/15/2017 07:45 PM, Amit Sela wrote:
> > > I have dinner at 9am.. which doesn't sound like a real thing if you
> > forget
> > > about timezones J
> > > How about 8am ? or something later like 12pm mid-day ?
> > > Apex can take the 9am time slot ;-)
> > >
> > > On Wed, Mar 15, 2017 at 4:28 AM Eugene Kirpichov
> > >  wrote:
> > >
> > >> Hi! Please feel free to join this call, but I think we'd be mostly
> > >> discussing how to do it in the Spark runner in particular; so we'll
> > >> probably need another call for Apex anyway.
> > >>
> > >> On Tue, Mar 14, 2017 at 6:54 PM Thomas Weise  wrote:
> > >>
> > >>> Hi Eugene,
> > >>>
> > >>> This would work for me also. Please let me know if you want to keep
> the
> > >>> Apex related discussion separate or want me to join this call.
> > >>>
> > >>> Thanks,
> > >>> Thomas
> > >>>
> > >>>
> > >>> On Tue, Mar 14, 2017 at 1:56 PM, Eugene Kirpichov <
> > >>> kirpic...@google.com.invalid> wrote:
> > >>>
> >  Sure, Friday morning sounds good. How about 9am Friday PST, at
> > >> videocall
> > >>> by
> >  link
> https://hangouts.google.com/hangouts/_/google.com/splittabledofn
> > >> ?
> > 
> >  On Mon, Mar 13, 2017 at 10:30 PM Amit Sela 
> > >> wrote:
> > 
> > > PST mornings are better, because they are evening/nights for me.
> > >> Friday
> > > would work-out best for me.
> > >
> > > On Mon, Mar 13, 2017 at 11:46 PM Eugene Kirpichov
> > >  wrote:
> > >
> > >> Awesome!!!
> > >>
> > >> Amit - remind me your time zone? JB, do you want to join?
> > >> I'm free this week all afternoons (say after 2pm) in Pacific Time,
> > >>> and
> > >> mornings of Wed & Fri. We'll probably need half an hour to an
> hour.
> > >>
> > >> On Mon, Mar 13, 2017 at 1:29 PM Aljoscha Krettek <
> > >>> aljos...@apache.org>
> > >> wrote:
> > >>
> > >>> I whipped up a quick version for Flink that seems to work:
> > >>> https://github.com/apache/beam/pull/2235
> > >>>
> > >>> There are still two failing tests, as described in the PR.
> > >>>
> > >>> On Mon, Mar 13, 2017, at 20:10, Amit Sela wrote:
> >  +1 for a video call. I think it should be pretty straight
> > >> forward
> >  for
> > >> the
> >  Spark runner after the work on read from UnboundedSource and
> > >>> after
> >  GroupAlsoByWindow, but from my experience such a call could
> > >> move
> > >>> us
> >  forward
> >  fast enough.
> > 
> >  On Mon, Mar 13, 2017, 20:37 Eugene Kirpichov <
> > >>> kirpic...@google.com
> > >
> >  wrote:
> > 
> > > Hi all,
> > >
> > > Let us continue working on this. I am back from various
> > >> travels
> >  and
> > >> am
> > > eager to help.
> > >
> > > Amit, JB - would you like to perhaps have a videocall to hash
> >  this
> > >> out
> > >>> for
> > > the Spark runner?
> > >
> > > Aljoscha - are the necessary Flink changes done / or is the
> > >>> need
> > > for
> > >>> them
> > > obviated by using the (existing) runner-facing state/timer
> > >>> APIs?
> > >>> Should we
> > > have a videocall too?
> > >
> > > Thomas - what do you think about getting this into Apex
> > >> runner?
> > >
> > > (I think videocalls will allow to make rapid progress, but
> > >> it's
> > >>> probably a
> > > better idea to keep them separate since they'll involve a lot
> > >>> of
> > > runner-specific details)
> > >
> > > PS - The completion of this in Dataflow streaming runner is
> > > currently
> > > waiting only on having a small service-side change
> > >> implemented
> >  and
> > >>> rolled
> > > out for termination of streaming jobs.
> > >
> > > On Wed, Feb 8, 2017 at 10:55 AM Kenneth Knowles <
> > >>> k...@google.com>
> > >>> wrote:
> > >
> > > I recommend proceeding with the runner-facing state & timer
> > >>> APIs;
> > >> they
> > >>> are
> > > lower-level and more

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Eugene Kirpichov

Amit - 8am is fine with me, let's do that.

On Wed, Mar 15, 2017 at 6:00 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Anyway, I hope it will result with some notes on the mailing list as it
> could be
> helpful.
>
> I'm not against a video call to move forward, but, from ma community
> perspective,  we should always provide minute notes on the mailing list.
>
> Unfortunately, next Friday, I will still be in China, so not possible to
> join
> (even if I would have like to participate :().
>
> Regards
> JB
>
> On 03/15/2017 07:45 PM, Amit Sela wrote:
> > I have dinner at 9am.. which doesn't sound like a real thing if you
> forget
> > about timezones J
> > How about 8am ? or something later like 12pm mid-day ?
> > Apex can take the 9am time slot ;-)
> >
> > On Wed, Mar 15, 2017 at 4:28 AM Eugene Kirpichov
> >  wrote:
> >
> >> Hi! Please feel free to join this call, but I think we'd be mostly
> >> discussing how to do it in the Spark runner in particular; so we'll
> >> probably need another call for Apex anyway.
> >>
> >> On Tue, Mar 14, 2017 at 6:54 PM Thomas Weise  wrote:
> >>
> >>> Hi Eugene,
> >>>
> >>> This would work for me also. Please let me know if you want to keep the
> >>> Apex related discussion separate or want me to join this call.
> >>>
> >>> Thanks,
> >>> Thomas
> >>>
> >>>
> >>> On Tue, Mar 14, 2017 at 1:56 PM, Eugene Kirpichov <
> >>> kirpic...@google.com.invalid> wrote:
> >>>
>  Sure, Friday morning sounds good. How about 9am Friday PST, at
> >> videocall
> >>> by
>  link https://hangouts.google.com/hangouts/_/google.com/splittabledofn
> >> ?
> 
>  On Mon, Mar 13, 2017 at 10:30 PM Amit Sela 
> >> wrote:
> 
> > PST mornings are better, because they are evening/nights for me.
> >> Friday
> > would work-out best for me.
> >
> > On Mon, Mar 13, 2017 at 11:46 PM Eugene Kirpichov
> >  wrote:
> >
> >> Awesome!!!
> >>
> >> Amit - remind me your time zone? JB, do you want to join?
> >> I'm free this week all afternoons (say after 2pm) in Pacific Time,
> >>> and
> >> mornings of Wed & Fri. We'll probably need half an hour to an hour.
> >>
> >> On Mon, Mar 13, 2017 at 1:29 PM Aljoscha Krettek <
> >>> aljos...@apache.org>
> >> wrote:
> >>
> >>> I whipped up a quick version for Flink that seems to work:
> >>> https://github.com/apache/beam/pull/2235
> >>>
> >>> There are still two failing tests, as described in the PR.
> >>>
> >>> On Mon, Mar 13, 2017, at 20:10, Amit Sela wrote:
>  +1 for a video call. I think it should be pretty straight
> >> forward
>  for
> >> the
>  Spark runner after the work on read from UnboundedSource and
> >>> after
>  GroupAlsoByWindow, but from my experience such a call could
> >> move
> >>> us
>  forward
>  fast enough.
> 
>  On Mon, Mar 13, 2017, 20:37 Eugene Kirpichov <
> >>> kirpic...@google.com
> >
>  wrote:
> 
> > Hi all,
> >
> > Let us continue working on this. I am back from various
> >> travels
>  and
> >> am
> > eager to help.
> >
> > Amit, JB - would you like to perhaps have a videocall to hash
>  this
> >> out
> >>> for
> > the Spark runner?
> >
> > Aljoscha - are the necessary Flink changes done / or is the
> >>> need
> > for
> >>> them
> > obviated by using the (existing) runner-facing state/timer
> >>> APIs?
> >>> Should we
> > have a videocall too?
> >
> > Thomas - what do you think about getting this into Apex
> >> runner?
> >
> > (I think videocalls will allow to make rapid progress, but
> >> it's
> >>> probably a
> > better idea to keep them separate since they'll involve a lot
> >>> of
> > runner-specific details)
> >
> > PS - The completion of this in Dataflow streaming runner is
> > currently
> > waiting only on having a small service-side change
> >> implemented
>  and
> >>> rolled
> > out for termination of streaming jobs.
> >
> > On Wed, Feb 8, 2017 at 10:55 AM Kenneth Knowles <
> >>> k...@google.com>
> >>> wrote:
> >
> > I recommend proceeding with the runner-facing state & timer
> >>> APIs;
> >> they
> >>> are
> > lower-level and more appropriate for this. All runners
> >> provide
>  them
> >> or
> >>> use
> > runners/core implementations, as they are needed for
> >>> triggering.
> >
> > On Wed, Feb 8, 2017 at 10:34 AM, Eugene Kirpichov <
> >>> kirpic...@google.com>
> > wrote:
> >
> > Thanks Aljoscha!
> >
> > Minor note: I'm not familiar with what level of support for
>  timers
> >>> Flink
> >

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela

So you're suggesting we copy-paste the current runner and adapt whatever is
necessary so it runs with Spark 2 ?
This also means any bug-fix / improvement would have to be maintained in
two runners, and I wouldn't wanna do that.

I don't like to think in terms of Spark1/2 but in terms of RDD/Dataset API.
Since the RDD API is mature, it should be the runner in master (not
preventing another runner once Dataset API is mature enough) and the
version (1.6.3 or 2.x) should be determined by the common installation.

That's why I believe we still need to leave things as they are, but start
working on the Dataset API runner.
Otherwise, we'll have the current runner, another RDD API runner with Spark
2, and a third one for the Dataset API. I don't want to maintain all of
them. It's a mess.

On Wed, Mar 15, 2017 at 6:39 PM Ismaël Mejía  wrote:

> > However, I do feel that we should use the Dataset API, starting with
> batch
> > support first. WDYT ?
>
> Well, this is the exact current status quo, and it will take us some
> time to have something as complete as what we have with the spark 1
> runner for the spark 2.
>
> The other proposal has two advantages:
>
> One is that we can leverage on the existing implementation (with the
> needed adjustments) to run Beam pipelines on Spark 2, in the end final
> users don’t care so much if pipelines are translated via RDD/DStream
> or Dataset, they just want to know that with Beam they can run their
> code in their favorite data processing framework.
>
> The other advantage is that we can base the work on the latest spark
> version and advance simultaneously in translators for both APIs, and
> once we consider that the DataSet is mature enough we can stop
> maintaining the RDD one and make it the official one.
>
> The only missing piece is backporting new developments on the RDD
> based translator from the spark 2 version into the spark 1, but maybe
> this won’t be so hard if we consider what you said, that at this point
> we are getting closer to have streaming right (of course you are the
> most appropriate person to decide if we are in a sufficient good shape
> to make this, so backporting things won’t be so hard).
>
> Finally I agree with you, I would prefer a nice and full featured
> translator based on the Structured Streaming API but the question is
> how much time this will take to be in shape and the impact on final
> users who are already requesting this. This is the reason why I think
> the more conservative approach (keeping around the RDD translator) and
> moving incrementally makes sense.
>
> On Wed, Mar 15, 2017 at 4:52 PM, Amit Sela  wrote:
> > I feel that as we're getting closer to supporting streaming with Spark 1
> > runner, and having Structured Streaming advance in Spark 2, we could
> start
> > work on Spark 2 runner in a separate branch.
> >
> > However, I do feel that we should use the Dataset API, starting with
> batch
> > support first. WDYT ?
> >
> > On Wed, Mar 15, 2017 at 5:47 PM Ismaël Mejía  wrote:
> >
> >> > So you propose to have the Spark 2 branch a clone of the current one
> with
> >> > adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
> >> while
> >> > still using the RDD API ?
> >>
> >> Yes this is exactly what I have in mind.
> >>
> >> > I think that having another Spark runner is great if it has value,
> >> > otherwise, let's just bump the version.
> >>
> >> There is value because most people are already starting to move to
> >> spark 2 and all Big Data distribution providers support it now, as
> >> well as the Cloud-based distributions (Dataproc and EMR) not like the
> >> last time we had this discussion.
> >>
> >> > We could think of starting to migrate the Spark 1 runner to Spark 2
> and
> >> > follow with Dataset API support feature-by-feature as ot advances,
> but I
> >> > think most Spark installations today still run 1.X, or am I wrong ?
> >>
> >> No, you are right, that’s why I didn’t even mentioned removing the
> >> spark 1 runner, I know that having to support things for both versions
> >> can add additional work for us, but maybe the best approach would be
> >> to continue the work only in the spark 2 runner (both refining the RDD
> >> based translator and starting to create the Dataset one there that
> >> co-exist until the DataSet API is mature enough) and keep the spark 1
> >> runner only for bug-fixes for the users who are still using it (like
> >> this we don’t have to keep backporting stuff). Do you see any other
> >> particular issue?
> >>
> >> Ismaël
> >>
> >> On Wed, Mar 15, 2017 at 3:39 PM, Amit Sela 
> wrote:
> >> > So you propose to have the Spark 2 branch a clone of the current one
> with
> >> > adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
> >> while
> >> > still using the RDD API ?
> >> >
> >> > I think that having another Spark runner is great if it has value,
> >> > otherwise, let's just

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía

> However, I do feel that we should use the Dataset API, starting with batch
> support first. WDYT ?

Well, this is the exact current status quo, and it will take us some
time to have something as complete as what we have with the spark 1
runner for the spark 2.

The other proposal has two advantages:

One is that we can leverage on the existing implementation (with the
needed adjustments) to run Beam pipelines on Spark 2, in the end final
users don’t care so much if pipelines are translated via RDD/DStream
or Dataset, they just want to know that with Beam they can run their
code in their favorite data processing framework.

The other advantage is that we can base the work on the latest spark
version and advance simultaneously in translators for both APIs, and
once we consider that the DataSet is mature enough we can stop
maintaining the RDD one and make it the official one.

The only missing piece is backporting new developments on the RDD
based translator from the spark 2 version into the spark 1, but maybe
this won’t be so hard if we consider what you said, that at this point
we are getting closer to have streaming right (of course you are the
most appropriate person to decide if we are in a sufficient good shape
to make this, so backporting things won’t be so hard).

Finally I agree with you, I would prefer a nice and full featured
translator based on the Structured Streaming API but the question is
how much time this will take to be in shape and the impact on final
users who are already requesting this. This is the reason why I think
the more conservative approach (keeping around the RDD translator) and
moving incrementally makes sense.

On Wed, Mar 15, 2017 at 4:52 PM, Amit Sela  wrote:
> I feel that as we're getting closer to supporting streaming with Spark 1
> runner, and having Structured Streaming advance in Spark 2, we could start
> work on Spark 2 runner in a separate branch.
>
> However, I do feel that we should use the Dataset API, starting with batch
> support first. WDYT ?
>
> On Wed, Mar 15, 2017 at 5:47 PM Ismaël Mejía  wrote:
>
>> > So you propose to have the Spark 2 branch a clone of the current one with
>> > adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
>> while
>> > still using the RDD API ?
>>
>> Yes this is exactly what I have in mind.
>>
>> > I think that having another Spark runner is great if it has value,
>> > otherwise, let's just bump the version.
>>
>> There is value because most people are already starting to move to
>> spark 2 and all Big Data distribution providers support it now, as
>> well as the Cloud-based distributions (Dataproc and EMR) not like the
>> last time we had this discussion.
>>
>> > We could think of starting to migrate the Spark 1 runner to Spark 2 and
>> > follow with Dataset API support feature-by-feature as ot advances, but I
>> > think most Spark installations today still run 1.X, or am I wrong ?
>>
>> No, you are right, that’s why I didn’t even mentioned removing the
>> spark 1 runner, I know that having to support things for both versions
>> can add additional work for us, but maybe the best approach would be
>> to continue the work only in the spark 2 runner (both refining the RDD
>> based translator and starting to create the Dataset one there that
>> co-exist until the DataSet API is mature enough) and keep the spark 1
>> runner only for bug-fixes for the users who are still using it (like
>> this we don’t have to keep backporting stuff). Do you see any other
>> particular issue?
>>
>> Ismaël
>>
>> On Wed, Mar 15, 2017 at 3:39 PM, Amit Sela  wrote:
>> > So you propose to have the Spark 2 branch a clone of the current one with
>> > adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
>> while
>> > still using the RDD API ?
>> >
>> > I think that having another Spark runner is great if it has value,
>> > otherwise, let's just bump the version.
>> > My idea of having another runner for Spark was not to support more
>> versions
>> > - we should always support the most popular version in terms of
>> > compatibility - the idea was to try and make Beam work with Structured
>> > Streaming, which is still not fully mature so that's why we're not
>> heavily
>> > investing there.
>> >
>> > We could think of starting to migrate the Spark 1 runner to Spark 2 and
>> > follow with Dataset API support feature-by-feature as ot advances, but I
>> > think most Spark installations today still run 1.X, or am I wrong ?
>> >
>> > On Wed, Mar 15, 2017 at 4:26 PM Ismaël Mejía  wrote:
>> >
>> >> BIG +1 JB,
>> >>
>> >> If we can just jump the version number with minor changes staying as
>> >> close as possible to the current implementation for spark 1 we can go
>> >> faster and offer in principle the exact same support but for version
>> >> 2.
>> >>
>> >> I know that the advanced streaming stuff based on the DataSet API
>> >> won't be there but with

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela

I feel that as we're getting closer to supporting streaming with Spark 1
runner, and having Structured Streaming advance in Spark 2, we could start
work on Spark 2 runner in a separate branch.

However, I do feel that we should use the Dataset API, starting with batch
support first. WDYT ?

On Wed, Mar 15, 2017 at 5:47 PM Ismaël Mejía  wrote:

> > So you propose to have the Spark 2 branch a clone of the current one with
> > adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
> while
> > still using the RDD API ?
>
> Yes this is exactly what I have in mind.
>
> > I think that having another Spark runner is great if it has value,
> > otherwise, let's just bump the version.
>
> There is value because most people are already starting to move to
> spark 2 and all Big Data distribution providers support it now, as
> well as the Cloud-based distributions (Dataproc and EMR) not like the
> last time we had this discussion.
>
> > We could think of starting to migrate the Spark 1 runner to Spark 2 and
> > follow with Dataset API support feature-by-feature as ot advances, but I
> > think most Spark installations today still run 1.X, or am I wrong ?
>
> No, you are right, that’s why I didn’t even mentioned removing the
> spark 1 runner, I know that having to support things for both versions
> can add additional work for us, but maybe the best approach would be
> to continue the work only in the spark 2 runner (both refining the RDD
> based translator and starting to create the Dataset one there that
> co-exist until the DataSet API is mature enough) and keep the spark 1
> runner only for bug-fixes for the users who are still using it (like
> this we don’t have to keep backporting stuff). Do you see any other
> particular issue?
>
> Ismaël
>
> On Wed, Mar 15, 2017 at 3:39 PM, Amit Sela  wrote:
> > So you propose to have the Spark 2 branch a clone of the current one with
> > adaptations around Context->Session, Accumulator->AccumulatorV2 etc.
> while
> > still using the RDD API ?
> >
> > I think that having another Spark runner is great if it has value,
> > otherwise, let's just bump the version.
> > My idea of having another runner for Spark was not to support more
> versions
> > - we should always support the most popular version in terms of
> > compatibility - the idea was to try and make Beam work with Structured
> > Streaming, which is still not fully mature so that's why we're not
> heavily
> > investing there.
> >
> > We could think of starting to migrate the Spark 1 runner to Spark 2 and
> > follow with Dataset API support feature-by-feature as ot advances, but I
> > think most Spark installations today still run 1.X, or am I wrong ?
> >
> > On Wed, Mar 15, 2017 at 4:26 PM Ismaël Mejía  wrote:
> >
> >> BIG +1 JB,
> >>
> >> If we can just jump the version number with minor changes staying as
> >> close as possible to the current implementation for spark 1 we can go
> >> faster and offer in principle the exact same support but for version
> >> 2.
> >>
> >> I know that the advanced streaming stuff based on the DataSet API
> >> won't be there but with this common canvas the community can iterate
> >> to create a DataSet based translator at the same time. In particular I
> >> consider the most important thing is that the spark 2 branch should
> >> not live for long time, this should be merged into master really fast
> >> for the benefit of everybody.
> >>
> >> Ismaël
> >>
> >>
> >> On Wed, Mar 15, 2017 at 1:57 PM, Jean-Baptiste Onofré 
> >> wrote:
> >> > Hi Amit,
> >> >
> >> > What do you think of the following:
> >> >
> >> > - in the mean time that you reintroduce the Spark 2 branch, what about
> >> > "extending" the version in the current Spark runner ? Still using
> >> > RDD/DStream, I think we can support Spark 2.x even if we don't yet
> >> leverage
> >> > the new provided features.
> >> >
> >> > Thoughts ?
> >> >
> >> > Regards
> >> > JB
> >> >
> >> >
> >> > On 03/15/2017 07:39 PM, Amit Sela wrote:
> >> >>
> >> >> Hi Cody,
> >> >>
> >> >> I will re-introduce this branch soon as part of the work on BEAM-913
> >> >> .
> >> >> For now, and from previous experience with the mentioned branch,
> batch
> >> >> implementation should be straight-forward.
> >> >> Only issue is with streaming support - in the current runner (Spark
> 1.x)
> >> >> we
> >> >> have experimental support for windows/triggers and we're working
> towards
> >> >> full streaming support.
> >> >> With Spark 2.x, there is no "general-purpose" stateful operator for
> the
> >> >> Dataset API, so I was waiting to see if the new operator
> >> >>  planned for next
> version
> >> >> could
> >> >> help with that.
> >> >>
> >> >> To summarize, I will introduce a skeleton for the Spark 2 runner with
> >> >> batch
> >> >> support as soon as I can as a separate branch.
> >> >>

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía

> So you propose to have the Spark 2 branch a clone of the current one with
> adaptations around Context->Session, Accumulator->AccumulatorV2 etc. while
> still using the RDD API ?

Yes this is exactly what I have in mind.

> I think that having another Spark runner is great if it has value,
> otherwise, let's just bump the version.

There is value because most people are already starting to move to
spark 2 and all Big Data distribution providers support it now, as
well as the Cloud-based distributions (Dataproc and EMR) not like the
last time we had this discussion.

> We could think of starting to migrate the Spark 1 runner to Spark 2 and
> follow with Dataset API support feature-by-feature as ot advances, but I
> think most Spark installations today still run 1.X, or am I wrong ?

No, you are right, that’s why I didn’t even mentioned removing the
spark 1 runner, I know that having to support things for both versions
can add additional work for us, but maybe the best approach would be
to continue the work only in the spark 2 runner (both refining the RDD
based translator and starting to create the Dataset one there that
co-exist until the DataSet API is mature enough) and keep the spark 1
runner only for bug-fixes for the users who are still using it (like
this we don’t have to keep backporting stuff). Do you see any other
particular issue?

Ismaël

On Wed, Mar 15, 2017 at 3:39 PM, Amit Sela  wrote:
> So you propose to have the Spark 2 branch a clone of the current one with
> adaptations around Context->Session, Accumulator->AccumulatorV2 etc. while
> still using the RDD API ?
>
> I think that having another Spark runner is great if it has value,
> otherwise, let's just bump the version.
> My idea of having another runner for Spark was not to support more versions
> - we should always support the most popular version in terms of
> compatibility - the idea was to try and make Beam work with Structured
> Streaming, which is still not fully mature so that's why we're not heavily
> investing there.
>
> We could think of starting to migrate the Spark 1 runner to Spark 2 and
> follow with Dataset API support feature-by-feature as ot advances, but I
> think most Spark installations today still run 1.X, or am I wrong ?
>
> On Wed, Mar 15, 2017 at 4:26 PM Ismaël Mejía  wrote:
>
>> BIG +1 JB,
>>
>> If we can just jump the version number with minor changes staying as
>> close as possible to the current implementation for spark 1 we can go
>> faster and offer in principle the exact same support but for version
>> 2.
>>
>> I know that the advanced streaming stuff based on the DataSet API
>> won't be there but with this common canvas the community can iterate
>> to create a DataSet based translator at the same time. In particular I
>> consider the most important thing is that the spark 2 branch should
>> not live for long time, this should be merged into master really fast
>> for the benefit of everybody.
>>
>> Ismaël
>>
>>
>> On Wed, Mar 15, 2017 at 1:57 PM, Jean-Baptiste Onofré 
>> wrote:
>> > Hi Amit,
>> >
>> > What do you think of the following:
>> >
>> > - in the mean time that you reintroduce the Spark 2 branch, what about
>> > "extending" the version in the current Spark runner ? Still using
>> > RDD/DStream, I think we can support Spark 2.x even if we don't yet
>> leverage
>> > the new provided features.
>> >
>> > Thoughts ?
>> >
>> > Regards
>> > JB
>> >
>> >
>> > On 03/15/2017 07:39 PM, Amit Sela wrote:
>> >>
>> >> Hi Cody,
>> >>
>> >> I will re-introduce this branch soon as part of the work on BEAM-913
>> >> .
>> >> For now, and from previous experience with the mentioned branch, batch
>> >> implementation should be straight-forward.
>> >> Only issue is with streaming support - in the current runner (Spark 1.x)
>> >> we
>> >> have experimental support for windows/triggers and we're working towards
>> >> full streaming support.
>> >> With Spark 2.x, there is no "general-purpose" stateful operator for the
>> >> Dataset API, so I was waiting to see if the new operator
>> >>  planned for next version
>> >> could
>> >> help with that.
>> >>
>> >> To summarize, I will introduce a skeleton for the Spark 2 runner with
>> >> batch
>> >> support as soon as I can as a separate branch.
>> >>
>> >> Thanks,
>> >> Amit
>> >>
>> >> On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere 
>> >> wrote:
>> >>
>> >>> Hi guys,
>> >>> Is there anybody who's currently working on Spark 2.x runner? A old PR
>> >>> for
>> >>> spark 2.x runner was closed a few days ago, so I wonder what's the
>> status
>> >>> now, and is there a roadmap for this?
>> >>> Thanks~
>> >>>
>> >>
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbono...@apache.org
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>>

Re: Performance Testing Next Steps

2017-03-15 Thread Ismaël Mejía

Excellent proposal, sorry to jump into this discussion so late, this
was in my toread list for almost two weeks, and I finally got the time
to read the document and I have two minor comments:

I have the impression that the strict separation of Providers (the
data-processing systems) and Resources (the concrete Data Stores)
makes sense for the general case, but is lacking if what we want to
test are things in the Hadoop ecosystem where the data stores commonly
co-exist in the same group of machines with the data-processing
systems (the Providers), e.g. HDFS, Hbase + YARN. This is important to
correctly test that data locality works correctly for example. Have
you considered such case?

Another thing I noticed is that in the list of runners supporting PKB
the Direct Runner is not included, is there any particular reason for
this? I think that even if performance is not the main goal of the
direct runner it can be nice to have it there too to catch any
performance regressions, or is it because it is already ready for it?
what do you think?

Thanks,
Ismaël

On Thu, Mar 2, 2017 at 11:49 PM, Amit Sela  wrote:
> Looks great, and I'll be sure to follow this. Ping me if I can assist in
> any way!
>
> On Fri, Mar 3, 2017 at 12:09 AM Ahmet Altay 
> wrote:
>
>> Sounds great, thank you!
>>
>> On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster > .invalid
>> > wrote:
>>
>> > D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for
>> Python
>> > in PKB against the Dataflow runner. Once the Fn API progresses some more
>> we
>> > can add some work items for the other runners too. Let's chat about this
>> > more, maybe next week?
>> >
>> > On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay 
>> > wrote:
>> >
>> > > Thank you Jason, this is great.
>> > >
>> > > Which one of these issues fall into the land of sdk-py?
>> > >
>> > > Ahmet
>> > >
>> > > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster <
>> > > jasonkus...@google.com.invalid> wrote:
>> > >
>> > > > Glad to hear the excitement. :)
>> > > >
>> > > > Filed BEAM-1595 - 1609 to track work items. Some of these fall under
>> > > runner
>> > > > components, please feel free to reach out to me if you have any
>> > questions
>> > > > about how to accomplish these.
>> > > >
>> > > > Best,
>> > > >
>> > > > Jason
>> > > >
>> > > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek <
>> aljos...@apache.org>
>> > > > wrote:
>> > > >
>> > > > > Thanks for writing this and taking care of this, Jason!
>> > > > >
>> > > > > I'm afraid I also cannot add anything except that I'm excited to
>> see
>> > > some
>> > > > > results from this.
>> > > > >
>> > > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles > >
>> > > > > wrote:
>> > > > >
>> > > > > Just got a chance to look this over. I don't have anything to add,
>> > but
>> > > > I'm
>> > > > > pretty excited to follow this project. Have the JIRAs been filed
>> > since
>> > > > you
>> > > > > shared the doc?
>> > > > >
>> > > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster <
>> > > > > jasonkus...@google.com.invalid> wrote:
>> > > > >
>> > > > > > Hey all, just wanted to pop this up again for people -- if anyone
>> > has
>> > > > > > thoughts on performance testing please feel welcome to chime in.
>> :)
>> > > > > >
>> > > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster <
>> > > jasonkus...@google.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I've written up a doc on next steps for getting performance
>> > testing
>> > > > up
>> > > > > > and
>> > > > > > > running for Beam. I'd love to hear from people -- there's a
>> fair
>> > > > amount
>> > > > > > of
>> > > > > > > work encapsulated in here, but the end result is that we have a
>> > > > > > performance
>> > > > > > > testing system which we can use for benchmarking all aspects of
>> > > Beam,
>> > > > > > which
>> > > > > > > would be really exciting. Looking forward to your thoughts.
>> > > > > > >
>> > > > > > > https://docs.google.com/document/d/
>> > 1PsjGPSN6FuorEEPrKEP3u3m16tyOz
>> > > > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73
>> > > > > > >
>> > > > > > > Best,
>> > > > > > >
>> > > > > > > Jason
>> > > > > > >
>> > > > > > > --
>> > > > > > > ---
>> > > > > > > Jason Kuster
>> > > > > > > Apache Beam / Google Cloud Dataflow
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > ---
>> > > > > > Jason Kuster
>> > > > > > Apache Beam / Google Cloud Dataflow
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > ---
>> > > > Jason Kuster
>> > > > Apache Beam / Google Cloud Dataflow
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > ---
>> > Jason Kuster
>> > Apache Beam / Google Cloud Dataflow
>> >
>>

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela

So you propose to have the Spark 2 branch a clone of the current one with
adaptations around Context->Session, Accumulator->AccumulatorV2 etc. while
still using the RDD API ?

I think that having another Spark runner is great if it has value,
otherwise, let's just bump the version.
My idea of having another runner for Spark was not to support more versions
- we should always support the most popular version in terms of
compatibility - the idea was to try and make Beam work with Structured
Streaming, which is still not fully mature so that's why we're not heavily
investing there.

We could think of starting to migrate the Spark 1 runner to Spark 2 and
follow with Dataset API support feature-by-feature as ot advances, but I
think most Spark installations today still run 1.X, or am I wrong ?

On Wed, Mar 15, 2017 at 4:26 PM Ismaël Mejía  wrote:

> BIG +1 JB,
>
> If we can just jump the version number with minor changes staying as
> close as possible to the current implementation for spark 1 we can go
> faster and offer in principle the exact same support but for version
> 2.
>
> I know that the advanced streaming stuff based on the DataSet API
> won't be there but with this common canvas the community can iterate
> to create a DataSet based translator at the same time. In particular I
> consider the most important thing is that the spark 2 branch should
> not live for long time, this should be merged into master really fast
> for the benefit of everybody.
>
> Ismaël
>
>
> On Wed, Mar 15, 2017 at 1:57 PM, Jean-Baptiste Onofré 
> wrote:
> > Hi Amit,
> >
> > What do you think of the following:
> >
> > - in the mean time that you reintroduce the Spark 2 branch, what about
> > "extending" the version in the current Spark runner ? Still using
> > RDD/DStream, I think we can support Spark 2.x even if we don't yet
> leverage
> > the new provided features.
> >
> > Thoughts ?
> >
> > Regards
> > JB
> >
> >
> > On 03/15/2017 07:39 PM, Amit Sela wrote:
> >>
> >> Hi Cody,
> >>
> >> I will re-introduce this branch soon as part of the work on BEAM-913
> >> .
> >> For now, and from previous experience with the mentioned branch, batch
> >> implementation should be straight-forward.
> >> Only issue is with streaming support - in the current runner (Spark 1.x)
> >> we
> >> have experimental support for windows/triggers and we're working towards
> >> full streaming support.
> >> With Spark 2.x, there is no "general-purpose" stateful operator for the
> >> Dataset API, so I was waiting to see if the new operator
> >>  planned for next version
> >> could
> >> help with that.
> >>
> >> To summarize, I will introduce a skeleton for the Spark 2 runner with
> >> batch
> >> support as soon as I can as a separate branch.
> >>
> >> Thanks,
> >> Amit
> >>
> >> On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere 
> >> wrote:
> >>
> >>> Hi guys,
> >>> Is there anybody who's currently working on Spark 2.x runner? A old PR
> >>> for
> >>> spark 2.x runner was closed a few days ago, so I wonder what's the
> status
> >>> now, and is there a roadmap for this?
> >>> Thanks~
> >>>
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía

BIG +1 JB,

If we can just jump the version number with minor changes staying as
close as possible to the current implementation for spark 1 we can go
faster and offer in principle the exact same support but for version
2.

I know that the advanced streaming stuff based on the DataSet API
won't be there but with this common canvas the community can iterate
to create a DataSet based translator at the same time. In particular I
consider the most important thing is that the spark 2 branch should
not live for long time, this should be merged into master really fast
for the benefit of everybody.

Ismaël


On Wed, Mar 15, 2017 at 1:57 PM, Jean-Baptiste Onofré  wrote:
> Hi Amit,
>
> What do you think of the following:
>
> - in the mean time that you reintroduce the Spark 2 branch, what about
> "extending" the version in the current Spark runner ? Still using
> RDD/DStream, I think we can support Spark 2.x even if we don't yet leverage
> the new provided features.
>
> Thoughts ?
>
> Regards
> JB
>
>
> On 03/15/2017 07:39 PM, Amit Sela wrote:
>>
>> Hi Cody,
>>
>> I will re-introduce this branch soon as part of the work on BEAM-913
>> .
>> For now, and from previous experience with the mentioned branch, batch
>> implementation should be straight-forward.
>> Only issue is with streaming support - in the current runner (Spark 1.x)
>> we
>> have experimental support for windows/triggers and we're working towards
>> full streaming support.
>> With Spark 2.x, there is no "general-purpose" stateful operator for the
>> Dataset API, so I was waiting to see if the new operator
>>  planned for next version
>> could
>> help with that.
>>
>> To summarize, I will introduce a skeleton for the Spark 2 runner with
>> batch
>> support as soon as I can as a separate branch.
>>
>> Thanks,
>> Amit
>>
>> On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere 
>> wrote:
>>
>>> Hi guys,
>>> Is there anybody who's currently working on Spark 2.x runner? A old PR
>>> for
>>> spark 2.x runner was closed a few days ago, so I wonder what's the status
>>> now, and is there a roadmap for this?
>>> Thanks~
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jesse Anderson

Excellent!

On Wed, Mar 15, 2017, 6:13 AM Jean-Baptiste Onofré  wrote:

> Hi Ahmet,
>
> it seems Jira is not up to date: 0.6.0 version is not flagged as
> "Released".
>
> Can you fix that please ?
>
> Thanks !
> Regards
> JB
>
> On 03/15/2017 05:22 AM, Ahmet Altay wrote:
> > I'm happy to announce that we have unanimously approved this release.
> >
> > There are 7 approving votes, 4 of which are binding:
> > * Aljoscha Krettek
> > * Davor Bonaci
> > * Ismaël Mejía
> > * Jean-Baptiste Onofré
> > * Robert Bradshaw
> > * Ted Yu
> > * Tibor Kiss
> >
> > There are no disapproving votes.
> >
> > Thanks everyone!
> >
> > Ahmet
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
-- 
Thanks,

Jesse

Re: Docker image dependencies

2017-03-15 Thread Ismaël Mejía

Hi, Thanks for bringing this subject to the mailing list.

+1
We definitely need a consensus on this, and I agree with your proposal and
JB’s comments modulo certain clarifications:

I think we shall go in this priority order if the version of the image we
want is available:

1. Image provided by the creator of the data source/sink (if they
officially maintain it). (This is the case of Elasticsearch for example) or
the Apache projects (if they provide one) as JB mentions.
2. Official docker images (because they have security fixes and have
guaranteed maintenance.
3. Non-official docker images or images from other providers that have good
maintainers e.g. quay.io

It makes sense to use the same image for all the tests. and to use the
fixed versions supported by the respective IO to avoid possible issues
during testing between different versions/naming of env variables, etc.

The Elasticsearch case is a 'good' example because it shows all the current
issues:

We should not use one elasticsearch image (elk) for some tests and a
different one for the other (the quay one), and if we resolve by priority
we would take the image provided by the creator (1) for both cases.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
However the current Beam Elasticsearch IO does not support Elasticsearch 5,
and elastic does not have an image for version 2, so in this particular
case following the priority order we should use the official docker image
(2) for the tests (assuming that both require the same version).
 Do you agree with this ?

Thinking about the ELK image I came with a new question. How do we deal
with IOs that require more than one base image, this is a common scenario
for projects that depend on Zookeeper? e.g. Kafka/Solr.  Usually people
coordinate those with a docker-compose file that creates an artificial
network to connect the Zookeeper image and the Kafka/Solr one
 just executing the 'docker-compose up' command
. Will we adopt this for such cases ?

I know that Kubernetes does this too, but the docker-compose format is
quite easy and textual,
and it is usually ready with the docker installation, additionally the
docker-compose files can easily be translated with kompose into Kubernetes
resources.

Ismaël

On Wed, Mar 15, 2017 at 3:17 AM, Jean-Baptiste Onofré 
wrote:

> Hi Stephen,
>
> 1. About the docker repositories, we now have official Docker repo at
> Apache. So, for the Apache projects, I would recommend the Apache official
> repo. Anyway, generally speaking, I would recommend the official repo (from
> the projects).
>
> 2. To avoid "unpredictable" breaking change, I would pin to a particular
> versions, and explicitly update if needed.
>
> 3. It's better that docker images are under an unique responsibility scope
> as different IOs can use the same resources, so they should use the same
> provided docker.
>
> By the way, I also have a docker coming for RedisIO ;)
>
> Regards
> JB
>
>
> On 03/15/2017 08:01 AM, Stephen Sisk wrote:
>
>> hi!
>>
>> as part of doing the work to enable IO ITs, we decided we want to use
>> docker. As part of that, we need to run docker images and they'll probably
>> be pulled from a docker repository.
>>
>> Questions:
>> * What docker repositories (and users on docker hub) do we as a group
>> allow
>> for images we'll run for hosted data stores?
>>  -> My proposal is we should only use repositories/images that are
>> regularly updated and that have someone saying that the images we depend
>> on
>> are secure. In the set of images currently linked to by checked in code/in
>> PR code, quay.io and official docker images seem fine. They both have
>> security scans (for what that's worth) and generally seem okay.
>>
>> * Do we pin to particular docker images or allow our version to float?
>>  -> I have seen docker images change in insecure way (e.g. switching the
>> name of the password parameter, meaning that the data store was secure
>> when
>> set up, and became insecure because no password was set after the image
>> update), so I'd prefer to pin to particular versions, and update on a
>> periodic basis.
>>
>> I'm relatively new to docker best practices, so I'm open to suggestions on
>> this.
>>
>> Current ITs with docker images:
>> * Jdbc - https://hub.docker.com/_/postgres/  (official image)
>> * Elasticsearch - https://hub.docker.com/r/sebp/elk/ (semi-official
>> looking
>> image)
>> * (PR in-flight
>> > ff9aebc9e99a3f324c9cf75a9R52>)
>> HadoopInputFormat's elasticsearch and cassandra tests -
>> https://hub.docker.com/_/cassandra/ and
>> https://quay.io/repository/pires/docker-elasticsearch-kubern
>> etes?tag=5.2.2=tags
>> (official image, and image from quay.io, which provides security audits
>> of
>> their images)
>>
>> The more I think about it, the less I'm excited about the sebp/elk image -
>> I'm sure it's fine, but I'd prefer using images from a source

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jean-Baptiste Onofré


Hi Ahmet,

it seems Jira is not up to date: 0.6.0 version is not flagged as "Released".

Can you fix that please ?

Thanks !
Regards
JB

On 03/15/2017 05:22 AM, Ahmet Altay wrote:

I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 4 of which are binding:
* Aljoscha Krettek
* Davor Bonaci
* Ismaël Mejía
* Jean-Baptiste Onofré
* Robert Bradshaw
* Ted Yu
* Tibor Kiss

There are no disapproving votes.

Thanks everyone!

Ahmet



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Beam spark 2.x runner status

2017-03-15 Thread Jean-Baptiste Onofré


Hi Amit,

What do you think of the following:

- in the mean time that you reintroduce the Spark 2 branch, what about 
"extending" the version in the current Spark runner ? Still using RDD/DStream, I 
think we can support Spark 2.x even if we don't yet leverage the new provided 
features.


Thoughts ?

Regards
JB

On 03/15/2017 07:39 PM, Amit Sela wrote:

Hi Cody,

I will re-introduce this branch soon as part of the work on BEAM-913
.
For now, and from previous experience with the mentioned branch, batch
implementation should be straight-forward.
Only issue is with streaming support - in the current runner (Spark 1.x) we
have experimental support for windows/triggers and we're working towards
full streaming support.
With Spark 2.x, there is no "general-purpose" stateful operator for the
Dataset API, so I was waiting to see if the new operator
 planned for next version could
help with that.

To summarize, I will introduce a skeleton for the Spark 2 runner with batch
support as soon as I can as a separate branch.

Thanks,
Amit

On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere  wrote:


Hi guys,
Is there anybody who's currently working on Spark 2.x runner? A old PR for
spark 2.x runner was closed a few days ago, so I wonder what's the status
now, and is there a roadmap for this?
Thanks~





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela

Hi Cody,

I will re-introduce this branch soon as part of the work on BEAM-913
.
For now, and from previous experience with the mentioned branch, batch
implementation should be straight-forward.
Only issue is with streaming support - in the current runner (Spark 1.x) we
have experimental support for windows/triggers and we're working towards
full streaming support.
With Spark 2.x, there is no "general-purpose" stateful operator for the
Dataset API, so I was waiting to see if the new operator
 planned for next version could
help with that.

To summarize, I will introduce a skeleton for the Spark 2 runner with batch
support as soon as I can as a separate branch.

Thanks,
Amit

On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere  wrote:

> Hi guys,
> Is there anybody who's currently working on Spark 2.x runner? A old PR for
> spark 2.x runner was closed a few days ago, so I wonder what's the status
> now, and is there a roadmap for this?
> Thanks~
>

Re: Style: how much testing for transform builder classes?

2017-03-15 Thread Ismaël Mejía

+1 to Vikas point maybe the right place to enforce things correct
build tests is in the validate and like this reduce the test
boilerplate and only test the validate, but I wonder if this totally
covers both cases (the buildsCorrectly and
buildsCorrectlyInDifferentOrder ones).

I answer Eugene’s question here even if you are aware now since you
commented in the PR, so everyone understands the case.

The case is pretty simple, when you extend an IO and add a new
configuration parameter, suppose we have withFoo(String foo) and we
want to add withBar(String bar). In some cases the implementation or
even worse the combination of those are not built correctly, so the
only way to guarantee that this works is to have code that tests the
complete parameter combination or tests that at least assert that the
object is built correctly.

This is something that can happen both with or without AutoValue
because the with method is hand-written and the natural tendency with
boilerplate methods like this is to copy/paste, so we can end up doing
silly things like:

private Read(String foo, String bar) { … }

public Read withBar(String bar) {
  return new Read(foo, null);
}

in this case the reference to bar is not stored or assigned (this is
similar to the case of the DatastoreIO PR), and AutoValue may seem to
solve this issue but you can also end up with this situation if you
copy paste the withFoo method and just change the method name:

public Read withBar(String foo) {
  return builder().setFoo(foo).build();
}

Of course both seem silly but both can happen and the tests at least
help to discover those, if Vikas proposition covers the
testsBuildCorrectly and testsBuildCorrectlyInDifferentOrder kind of
tests I think it is OK to get rid of those.

On Wed, Mar 15, 2017 at 1:05 AM, vikas rk  wrote:
> Yes, what I meant is: Necessary tests are ones that blocks users if not
> present. Trivial or non-trivial shouldn't be the issue in such cases.
>
> Some of the boilerplate code and tests is because IO PTransforms are
> returned to the user before they are fully constructed and actual
> validation happens in the validate method rather than at construction. I
> understand that the reasoning here is that we want to support constructing
> them with options in any order and using Builder pattern can be confusing.
>
> If validate method is where all the validation happens, then we should able
> to eliminate some redundant checks and tests during construction time like
> in *withOption* methods here
> 
>  and here
> 
> as
> these are also checked in the validate method.
>
>
>
>
>
>
>
>
>
>
>
> -Vikas
>
>
>
> On 14 March 2017 at 15:40, Eugene Kirpichov 
> wrote:
>
>> Thanks all. Looks like people are on board with the general direction
>> though it remains to refine it to concrete guidelines to go into style
>> guide.
>>
>> Ismaël, can you give more details about the situation you described in the
>> first paragraph? Is it perhaps that really a RunnableOnService test was
>> missing (and perhaps still is), rather than a builder test?
>>
>> Vikas, regarding trivial tests and user waiting for a work-around: in the
>> situation I described, they don't really need a workaround - they specified
>> an invalid value and have been minorly inconvenienced because the error
>> they got about it was not very readable, so fixing their value took them a
>> little longer than it could have, but they fixed it and their work is not
>> blocked. I think Robert's arguments about the cost of trivial tests apply.
>>
>> I agree that the author should be at liberty to choose which validation to
>> unit-test and which to skip as trivial, so documentation on this topic
>> should be in the form of guidelines, high-quality example code (i.e. clean
>> up the unit tests of IOs bundled with Beam SDK), and informal knowledge in
>> the heads of readers of this thread, rather than hard rules.
>>
>> On Tue, Mar 14, 2017 at 8:07 AM Ismaël Mejía  wrote:
>>
>> > +0.5
>> >
>> > I used to think that some of those tests were not worth, for example
>> > testBuildRead and
>> > testBuildReadAlt. However the reality is that these tests allowed me to
>> > find bugs both during the development of HBaseIO and just yesterday when
>> I
>> > tried to test the write support for the emulator with DataStoreIO (that
>> > lacked a parameter in testBuildWrite and didn’t have a testBuildWriteAlt
>> > and broke in that case too), so I now believe they are not necessarily
>> > useless.
>> >
>> > I agree with the idea of trying to test the most important things first
>> and
>> > as Kenneth said trying to

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ismaël Mejía

Thanks Ahmet for dealing with the release, I just tried the pip install
apache-beam and the wordcount example and as you said it feels awesome to
see this working so easily now. Congrats to everyone working on the python
SDK !


On Wed, Mar 15, 2017 at 8:17 AM, Ahmet Altay 
wrote:

> This release is now complete. Thanks to everyone who have helped make this
> release possible!
>
> Before sending a note to users@, I would like to make a pass over the
> website and simplify things now that we have an official python release. I
> did the first 'pip install apache-beam' today and it felt amazing!
>
> Ahmet
>
>
> On Tue, Mar 14, 2017 at 2:22 PM, Ahmet Altay  wrote:
>
> > I'm happy to announce that we have unanimously approved this release.
> >
> > There are 7 approving votes, 4 of which are binding:
> > * Aljoscha Krettek
> > * Davor Bonaci
> > * Ismaël Mejía
> > * Jean-Baptiste Onofré
> > * Robert Bradshaw
> > * Ted Yu
> > * Tibor Kiss
> >
> > There are no disapproving votes.
> >
> > Thanks everyone!
> >
> > Ahmet
> >
>

Jenkins build is still unstable: beam_Release_NightlySnapshot #357

2017-03-15 Thread Apache Jenkins Server

See

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ahmet Altay

This release is now complete. Thanks to everyone who have helped make this
release possible!

Before sending a note to users@, I would like to make a pass over the
website and simplify things now that we have an official python release. I
did the first 'pip install apache-beam' today and it felt amazing!

Ahmet

On Tue, Mar 14, 2017 at 2:22 PM, Ahmet Altay  wrote:

> I'm happy to announce that we have unanimously approved this release.
>
> There are 7 approving votes, 4 of which are binding:
> * Aljoscha Krettek
> * Davor Bonaci
> * Ismaël Mejía
> * Jean-Baptiste Onofré
> * Robert Bradshaw
> * Ted Yu
> * Tibor Kiss
>
> There are no disapproving votes.
>
> Thanks everyone!
>
> Ahmet
>

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

Re: Beam spark 2.x runner status

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

Re: Docker image dependencies

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

Re: Beam spark 2.x runner status

Re: Beam spark 2.x runner status

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

Re: Beam spark 2.x runner status

Re: Beam spark 2.x runner status

Re: Beam spark 2.x runner status

Re: Beam spark 2.x runner status

Re: Performance Testing Next Steps

Re: Beam spark 2.x runner status

Re: Beam spark 2.x runner status

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

Re: Docker image dependencies

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

Re: Beam spark 2.x runner status

Re: Beam spark 2.x runner status

Re: Style: how much testing for transform builder classes?

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

Jenkins build is still unstable: beam_Release_NightlySnapshot #357

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

25 matches

Site Navigation

Mail list logo

Footer information