Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-03 Thread Jean-Baptiste Onofré
Hi guys,

just to let you know that the build is now OK. I'm completing the Jira
triage this morning (my time) and cut the release branch (starting the
release process). I will validate the release guide in the mean time.

Thanks,
Regards
JB

On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
> Hi guys,
> 
> Apache Beam 2.4.0 has been released on March 20th.
> 
> According to our cycle of release (roughly 6 weeks), we should think about 
> 2.5.0.
> 
> I'm volunteer to tackle this release.
> 
> I'm proposing the following items:
> 
> 1. We start the Jira triage now, up to Tuesday
> 2. I would like to cut the release on Tuesday night (Europe time)
> 2bis. I think it's wiser to still use Maven for this release. Do you think we
> will be ready to try a release with Gradle ?
> 
> After this release, I would like a discussion about:
> 1. Gradle release (if we release 2.5.0 with Maven)
> 2. Isolate release cycle per Beam part. I think it would be interesting to 
> have
> different release cycle: SDKs, DSLs, Runners, IOs. That's another discussion, 
> I
> will start a thread about that.
> 
> Thoughts ?
> 
> Regards
> JB
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Go SDK Example=

2018-06-03 Thread Kenneth Knowles
Hi James,

Welcome!

Have you subscribed to dev@beam.apache.org? I am including that list here,
since that is the most active list for discussing contributions. I've also
included Henning explicitly. He is the best person to answer.

I found your JIRA account and set up permissions so you can be assigned
issues.

Kenn

On Sun, Jun 3, 2018 at 12:35 PM James Wilson  wrote:

> Hi All,
>
> This is first time I am trying to contribute to a large open source
> project.  I was going to tackle the BEAM-4292 "Add streaming word count
> example" for the Go SDK.  Do I assign it to myself or just complete the
> task and create a PR request?  I read through the contributing page on the
> Apache Beam site, but it didn’t go into how to tackle your first task.  Any
> help would be appreciated.
>
> Best,
> James


Re: Beam SQL Improvements

2018-06-03 Thread Reuven Lax
Just an update: Romain and I chatted on Slack, and I think I understand his
concern. The concern wasn't specifically about schemas, rather about having
a generic way to register per-ParDo state that has worker lifetime. As
evidence that such is needed, in many cases static variables are used to
simiulate that. static variables however have downsides - if two pipelines
are run on the same JVM (happens often with unit tests, and there's nothing
that prevents a runner from doing so in a production environment), these
static variables will interfere with each other.

On Thu, May 24, 2018 at 12:30 AM Reuven Lax  wrote:

> Romain, maybe it would be useful for us to find some time on slack. I'd
> like to understand your concerns. Also keep in mind that I'm tagging all
> these classes as Experimental for now, so we can definitely change these
> interfaces around if we decide they are not the best ones.
>
> Reuven
>
> On Tue, May 22, 2018 at 11:35 PM Romain Manni-Bucau 
> wrote:
>
>> Why not extending ProcessContext to add the new remapped output? But
>> looks good (the part i dont like is that creating a new context each time a
>> new feature is added is hurting users. What when beam will add some
>> reactive support? ReactiveOutputReceiver?)
>>
>> Pipeline sounds the wrong storage since once distributed you serialized
>> the instances so kind of broke the lifecycle of the original instance and
>> have no real release/close hook on them anymore right? Not sure we can do
>> better than dofn/source embedded instances today.
>>
>>
>>
>>
>> Le mer. 23 mai 2018 08:02, Romain Manni-Bucau  a
>> écrit :
>>
>>>
>>>
>>> Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré  a
>>> écrit :
>>>
 Hi,

 IMHO, it would be better to have a explicit transform/IO as converter.

 It would be easier for users.

 Another option would be to use a "TypeConverter/SchemaConverter" map as
 we do in Camel: Beam could check the source/destination "type" and check
 in the map if there's a converter available. This map can be store as
 part of the pipeline (as we do for filesystem registration).

>>>
>>>
>>> It works in camel because it is not strongly typed, isnt it? So can
>>> require a beam new pipeline api.
>>>
>>> +1 for the explicit transform, if added to the pipeline api as coder it
>>> wouldnt break the fluent api:
>>>
>>> p.apply(io).setOutputType(Foo.class)
>>>
>>> Coders can be a workaround since they owns the type but since the
>>> pcollection is the real owner it is surely saner this way, no?
>>>
>>> Also it needs to ensure all converters are present before running the
>>> pipeline probably, no implicit environment converter support is probably
>>> good to start to avoid late surprises.
>>>
>>>
>>>
 My $0.01

 Regards
 JB

 On 23/05/2018 07:51, Romain Manni-Bucau wrote:
 > How does it work on the pipeline side?
 > Do you generate these "virtual" IO at build time to enable the fluent
 > API to work not erasing generics?
 >
 > ex: SQL(row)->BigQuery(native) will not compile so we need a
 > SQL(row)->BigQuery(row)
 >
 > Side note unrelated to Row: if you add another registry maybe a
 pretask
 > is to ensure beam has a kind of singleton/context to avoid to
 duplicate
 > it or not track it properly. These kind of converters will need a
 global
 > close and not only per record in general:
 > converter.init();converter.convert(row);converter.destroy();,
 > otherwise it easily leaks. This is why it can require some way to not
 > recreate it. A quick fix, if you are in bytebuddy already, can be to
 add
 > it to setup/teardown pby, being more global would be nicer but is more
 > challenging.
 >
 > Romain Manni-Bucau
 > @rmannibucau  |  Blog
 >  | Old Blog
 >  | Github
 >  | LinkedIn
 >  | Book
 > <
 https://www.packtpub.com/application-development/java-ee-8-high-performance
 >
 >
 >
 > Le mer. 23 mai 2018 à 07:22, Reuven Lax >>> > > a écrit :
 >
 > No - the only modules we need to add to core are the ones we
 choose
 > to add. For example, I will probably add a registration for
 > TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
 > with schemas. However I will add that to the GCP module, so only
 > someone depending on that module need to pull in that dependency.
 > The Java ServiceLoader framework can be used by these modules to
 > register schemas for their types (we already do something similar
 > for FileSystem and for coders as well).
 >
 > BTW, right now the conversion back and forth between Row objects
 I'm
 > doing in the ByteBuddy