Re: Python 3: final step

2018-10-10 Thread Manu Zhang
Does anyone know how to set up python version on Jenkins ? It’s Python 3.5.2 
now.

Thanks,
Manu Zhang
On Oct 5, 2018, 9:24 AM +0800, Valentyn Tymofieiev , wrote:
> I have put together a guide [1] to help get started with investigating Python 
> 3-related test failures that may be helpful for new folks joining the effort.
>
> Comments and improvements welcome!
>
> Thanks,
> Valentyn
>
> [1] 
> https://docs.google.com/document/d/1s1BJVCY65LB_SYK1SU1u7NbZiFANoq-nEYaEvzRbYlA
>
>
> > On Thu, Oct 4, 2018 at 11:26 AM Valentyn Tymofieiev  
> > wrote:
> > > I agree there is some overlap between JIRAs that track individual 
> > > failures and module-level JIRAs. We originally wanted to do the 
> > > conversion on a module-by-module basis, however we learned that test 
> > > failures in some modules require changes in other modules, and it may be 
> > > a little easier to slice the problem if we focus on classes of failures.
> > >
> > > Module-level JIRAs can still be useful for tracking the end result: tox 
> > > suites cover all tests in the module in Py3 environment, and there are no 
> > > disabled tests in the module that don't have individual JIRAs tracking 
> > > them.
> > >
> > > I suggest that folks who are working on module-level JIRAs assign to 
> > > themselves the JIRAs that track individual failures if/when they are 
> > > actively addressing them. This way, unassigned problem-specific JIRAs can 
> > > use help from the community.
> > >
> > > Thanks,
> > > Valentyn
> > >
> > >
> > > > On Wed, Oct 3, 2018 at 8:14 PM Manu Zhang  
> > > > wrote:
> > > > > Thanks Valentyn. Note some test failing issues are covered by “Finish 
> > > > > Python 3 porting for *** module”, e.g. 
> > > > > https://issues.apache.org/jira/browse/BEAM-5315.
> > > > >
> > > > > Manu
> > > > > 在 2018年10月3日 +0800 PM4:18,Valentyn Tymofieiev 
> > > > > ,写道:
> > > > > > Hi Rakesh and Manu,
> > > > > >
> > > > > > Thanks to both of you for offering help (in different threads). 
> > > > > > It's great to see that more and more people get involved with 
> > > > > > helping to make Beam Python 3 compatible!
> > > > > >
> > > > > > There are a few PRs in flight, and several people in the community 
> > > > > > actively work on Python 3 support now. I would be happy to 
> > > > > > coordinate the work so that we don't step at each others toes and 
> > > > > > avoid duplication of effort.
> > > > > >
> > > > > > I recently looked at unit tests that are still failing in Python 3 
> > > > > > environment  and filed a few issues (within range BEAM-5615 - 
> > > > > > BEAM-5629), to track similar classes of errors. You can also find 
> > > > > > them on Kanban board [1].
> > > > > > In particular, BEAM-5620 and BEAM-5627 should be easy issues to get 
> > > > > > started.
> > > > > >
> > > > > > There are multiple ways you can help:
> > > > > > - Helping to rootcause errors. Even a comment why a test is failing 
> > > > > > and a suggestion how to fix it, will be helpful for others when you 
> > > > > > don't have time to do the fix.
> > > > > > - Helping with code reviews.
> > > > > > - Reporting new issues (as subtasks to BEAM-1251), deduplicating or 
> > > > > > splitting the existing issues. We probably don't want to file a 
> > > > > > Jira for each of 250+ currently failing tests at this point, but it 
> > > > > > may make sense to track the errors that occur repeatedly share the 
> > > > > > root cause.
> > > > > > - Fixing the issues. Feel free to assign an issue to yourself if 
> > > > > > you have a fix in mind and plan to actively work on it. Due to the 
> > > > > > nature of the problem it may occasionally happen that two issues 
> > > > > > share the rootcause, or fixing one issue is a prerequisite for 
> > > > > > fixing another issue, so sync to master often to make sure the 
> > > > > > issue you are working on is not already fixed.
> > > > > >
> > > > > > I'll also keep an eye on the PRs and will try to keep the list of 
> > > > > > open issues up to date.
> > > > > >
> > > > > > Thanks,
> > > > > > Valentyn
> > > > > >
> > > > > > [1]: 
> > > > > > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail
> > > > > >
> > > > > >
> > > > > > > On Tue, Oct 2, 2018 at 9:38 AM Pablo Estrada  
> > > > > > > wrote:
> > > > > > > > Very cool : ) I'm also available to review / merge if you need 
> > > > > > > > help from my side.
> > > > > > > > Best
> > > > > > > > -P.
> > > > > > > >
> > > > > > > > > On Tue, Oct 2, 2018 at 7:45 AM Rakesh Kumar 
> > > > > > > > >  wrote:
> > > > > > > > > > Hi Rob,
> > > > > > > > > >
> > > > > > > > > > I am, Rakesh Kumar, using Beam SDK for one of my projects 
> > > > > > > > > > at Lyft. I have been working closely with Thomas Weise. I 
> > > > > > > > > > have already met a couple of Python SDK developers in 
> > > > > > > > > > person.
> > > > > > > > > > I am interested to help migrate to Python 3. You can assign 
> > > > > > > > > > me PRs for review. I am also more than happy 

BEAM-2953, Timeseries library

2018-10-10 Thread rarokni
RE: Pull Request : https://github.com/apache/beam/pull/6540

I have been doing some work on a generalized set of timeseries transforms, with 
the goal to abstract the user from the process of dealing with some of the 
common problems when working with timeseries in BEAM batch /  stream mode. 
Would love to get feedback, comments, ideas and I hope, after things flesh out 
more, collaborators! Of course it will not cover all issues in the timeseries 
problem space, but from many interactions and discussions over the last couple 
of years, I feel it has the potential to help with a large enough set of use 
cases to make it worthwhile endeavor. 

Primary goals:
Remove as much "boilerplate" as possible form common timeseries pre-processing 
tasks.
Deal with a couple of the harder problems with timeseries when processed as a 
stream in a distributed system. Some example use cases (which we use state api 
and timers to solve):
IOT : A device sends signals when something changes but nothing if there has 
been no update to save battery. The absence of data downstream does not mean 
that there is no information, it's just not been observed. (Of course it could 
be the IOT device went boom.. but in the absence of new data, the last known 
value is assumed until some ttl is reached).
Finance :  Ticks in fx finance data will come with Ask and Bid prices as they 
change, if however no ASK or BID price is seen the last known value is assumed.
Provide some common sinks as reference, for example output of Tensorflow 
Sequence Examples onto storage systems. The initial sinks in the pull requests 
are based on Google Cloud sinks, but this should be expanded to other platforms 
I hope with the help of some of the good folks on this thread! 

In order to make this a tractable problem, there are some fundamental 
assumptions that have been made. 

The raw timeseries data will translate to a common representation. The first 
pass of this is below. Users main 'coding task' will be to convert their 
objects to :

Single property
https://github.com/rezarokni/beam/blob/timeseries/sdks/java/extensions/timeseries/src/main/proto/TimeSeriesData.proto#L66

Multivariate: 
https://github.com/rezarokni/beam/blob/timeseries/sdks/java/extensions/timeseries/src/main/proto/TimeSeriesData.proto#L75

The primary utility of this library is for stream processing. While it will 
work fine in batch mode there are many already established tools for dealing 
with timeseries data that has already landed in a data store. 
This library is not intended as a data analytics tool, although the output of 
the library has potential to be very useful within analytics tools it is a side 
benefit.

Would be great to get feedback and if you are interested in helping more 
directly please ping.

Cheers

Reza



Re: Beam Samza Runner status update

2018-10-10 Thread Jesse Anderson
Interesting

On Wed, Oct 10, 2018, 3:49 PM Kenneth Knowles  wrote:

> Welcome, Hai!
>
> On Wed, Oct 10, 2018 at 3:46 PM Hai Lu  wrote:
>
>> Hi, all
>>
>> This is Hai from LinkedIn. As Xinyu mentioned, I have been working on
>> portable API for Samza runner and made some solid progress. It's been a
>> very smooth process (although not effortless for sure) and I'm really
>> grateful for the great platform that you all have built. I'm very
>> impressed. Bravo!
>>
>> Excited to work with everyone on Beam. Do expect more questions from me
>> down the road.
>>
>> Thanks,
>> Hai
>>
>> On Wed, Oct 10, 2018 at 12:36 PM Kenneth Knowles  wrote:
>>
>>> Clarification: Thomas Groh wrote the fuser, not me!
>>>
>>> Thanks for the sharing all this. Really cool.
>>>
>>> Kenn
>>>
>>> On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:
>>>
 Thanks for sharing! it's so exciting to hear that Beam is being used on
 Samza in production @LinkedIn! Your feedback will be helpful to Beam
 community!

 Besides, Beam supports SQL right now and hopefully Beam community could
 also receive feedback on BeamSQL
  in the
 future.

 -Rui

 On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
 wrote:

> Thanks for sharing and congrats for this great work !
>
> Regards
> JB
> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
> target=_blank>xinyuliu...@gmail.com> a écrit:
>>
>> Hi, All,
>>
>> It's been over four months since we added the Samza Runner to Beam,
>> and we've been making a lot of progress after that. Here I would like to
>> update your guys and share some really good news happening here at 
>> LinkedIn:
>>
>> 1) First Beam job in production @LInkedIn!
>> After a few rounds of testing and benchmarking, we finally rolled out
>> our first Beam job here! The job uses quite a few features, such as event
>> time, fixed/session windowing, early triggering, and stateful processing.
>> Our first customer is very happy and they highly appraise the easy-to-use
>> Beam API as well as powerful processing model. Due to the limited 
>> resources
>> here, we put our full trust in the work you guys are doing, and we didn't
>> run into any surprises. We see extremely attention to details as well as
>> non-compromise in any user experience everywhere in the code base. We 
>> would
>> like to thank everyone in the Beam community to contribute to such an
>> amazing framework!
>>
>> 2) A portable Samza Runner prototype
>> We are also starting the work in making Samza Runner portable. So far
>> we just got the python word count example working using portable Samza
>> Runner. Please look out for the PR for this very soon :). Again, this 
>> work
>> is not possible without the great Beam portability framework, and the
>> developers like Luke and Ahmet, just to name a few, behind it. The
>> ReferenceRunner has been extremely useful to us to figure out what's 
>> needed
>> and how it works. Kudos to Thomas Groh, Ben Sidhom and all the others who
>> makes this available to us. And to Kenn, your fuse work rocks.
>>
>> 3) More contributors in Samza Runner
>> The runner has been Chris and my personal project for a while and now
>> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
>> contribute. Hai has been focusing on the portability work as mentioned in
>> #2, and Boris will work mostly on supporting our use cases. We will send
>> more emails discussing our use cases, like the "Update state after 
>> firing"
>> email I sent out earlier.
>>
>> Finally, a shout-out to our very own Chris Pettitt. Without you, none
>> of the above won't happen!
>>
>> Thanks,
>> Xinyu
>>
>


Re: Log output from Dataflow tests

2018-10-10 Thread Ankur Goenka
Hi Max, I don't have edit privileges for the project so can't modify user.

On Wed, Oct 10, 2018 at 9:02 AM Maximilian Michels  wrote:

> Thank you Scott! Ismael also sent me the logs and I could fix the error.
>
> It seems we have granted read-only access to project members in the
> past. I just checked back with Ankur, he might be able to grant access
> for my GCP account.
>
> -Max
>
> On 10.10.18 17:26, Scott Wegner wrote:
> > I'm not sure how apache-beam-testing permissions are managed; Kenn,
> > could we grant read-access for contributors who need it for testing?
> >
> > Here are two logs from the job that seem relevant:
> >
> > 2018-10-08 14:44:45.381 PDT
> > Parsing unknown args:
> > [u'--dataflowJobId=2018-10-08_14_41_03-9578125971484804239',
> > u'--autoscalingAlgorithm=NONE', u'--direct_runner_use_stacked_bundle',
> > u'--maxNumWorkers=0', u'--style=scrambled', u'--sleep_secs=20',
> > u'--pipeline_type_check',
> >
> u'--gcpTempLocation=gs://temp-storage-for-end-to-end-tests/temp-it/beamapp-jenkins-1008214058-522436.1539034858.522554',
>
> > u'--numWorkers=1',
> > u'--beam_plugins=apache_beam.io.filesystem.FileSystem',
> > u'--beam_plugins=apache_beam.io.hadoopfilesystem.HadoopFileSystem',
> > u'--beam_plugins=apache_beam.io.localfilesystem.LocalFileSystem',
> > u'--beam_plugins=apache_beam.io.gcp.gcsfilesystem.GCSFileSystem',
> > u'--beam_plugins=apache_beam.io.filesystem_test.TestingFileSystem',
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PipelineGraphRenderer',
>
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.MuteRenderer',
>
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.TextRenderer',
>
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PydotRenderer',
>
> >
> u'--pipelineUrl=gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1008214058-522436.1539034858.522554/pipeline.pb']
> >
> > 2018-10-08 14:44:45.382 PDT
> > Python sdk harness failed: Traceback (most recent call last): File
> >
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
>
> > line 133, in main
> > sdk_pipeline_options.get_all_options(drop_default=True)) File
> >
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/pipeline_options.py",
>
> > line 227, in get_all_options action='append' if num_times > 1 else
> > 'store') File "/usr/lib/python2.7/argparse.py", line 1308, in
> > add_argument return self._add_action(action) File
> > "/usr/lib/python2.7/argparse.py", line 1682, in _add_action
> > self._optionals._add_action(action) File
> > "/usr/lib/python2.7/argparse.py", line 1509, in _add_action action =
> > super(_ArgumentGroup, self)._add_action(action) File
> > "/usr/lib/python2.7/argparse.py", line 1322, in _add_action
> > self._check_conflict(action) File "/usr/lib/python2.7/argparse.py", line
> > 1460, in _check_conflict conflict_handler(action, confl_optionals) File
> > "/usr/lib/python2.7/argparse.py", line 1467, in _handle_conflict_error
> > raise ArgumentError(action, message % conflict_string) ArgumentError:
> > argument --beam_plugins: conflicting option string(s): --beam_plugins
> >
> > On Wed, Oct 10, 2018 at 1:05 AM Maximilian Michels  > > wrote:
> >
> > Would be great to provide access to Dataflow build logs.
> >
> > In the meantime, could someone with access send me the logs for the
> job
> > below?
> >
> >
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_14_41_03-9578125971484804239?project=apache-beam-testing
> >
> > Thanks,
> > Max
> >
> > On 09.10.18 13:45, Maximilian Michels wrote:
> >  > Hi,
> >  >
> >  > I'm debugging a test failure in Dataflow PostCommit. There are
> logs
> >  > available which I can't access. Is it possible to be added to the
> >  > apache-beam-testing project?
> >  >
> >  > Thanks,
> >  > Max
> >  >
> >  >
> >  > Example:
> >  >
> >
>  ==
> >  > FAIL: test_streaming_with_attributes
> >  > (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> >  >
> >
>  --
> >  > Traceback (most recent call last):
> >  >File
> >  >
> >
>  
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
> >
> >  > line 175, in test_streaming_with_attributes
> >  >  self._test_streaming(with_attributes=True)
> >  >File
> >  >
> >
>  
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
> >
> >  > line 167, in _test_streaming
> >  >  timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> >  

Re: Java > 8 support

2018-10-10 Thread Pablo Estrada
Hello all,
If I understand you correctly Ismael, a good amount of
'beam-sdks-java-core' tests are already passing with Java 11, so the amount
of work necessary on the core module should be relatively small. Is this
correct? Are there improvements that may be missing in terms of
modularization?

There is also the work necessary to build/run tests with Gradle

I am also curious... how much work do you estimate is necessary to support
Java 11 with some of the existing sources? I understand that we have many,
many sources, but perhaps some of the more popular ones (e.g. TextIO)?

Thanks!
-P.

On Wed, Oct 10, 2018 at 12:59 AM Arif Kasim  wrote:

> Thanks for the clarification Ismaël.
>
>
>
>
>
> *  •  **Arif Kasim*
> *  • * Strategic Cloud Engineer
> *  •  *Google, Inc.
>   •  arifka...@google.com
>
>
>
>
> On Wed, Oct 10, 2018 at 9:41 AM Ismaël Mejía  wrote:
>
>> Just wanted to clarify, there is already a JIRA for ongoing work on
>> Java 11 support.
>> https://issues.apache.org/jira/browse/BEAM-2530
>>
>> I led the initial work on supporting what at the time was Java 9/10,
>> so far the biggest blockers were around the ApiSurface tests (not at
>> all compatible with these versions) but at the time we were at 5 tests
>> from getting sdks/core passing. Notice also that the scope of this
>> JIRA evolved to support only the LTS version (Java 11), and
>> specifically to support only sdks/core + direct runner. Supporting all
>> IOs or runners really is more a question of the dependencies working
>> nicely with Java 11 so this will probably take long time. Also the
>> idea so far does NOT include supporting the Java module system at all.
>>
>> I stopped working on this during the move to gradle because it was too
>> hard to tackle both Java evolving and all the ongoing changes in the
>> build system. If somebody in the community wants to contribute in this
>> area it will be greatly appreciated, notice that all the work we did
>> on the build system for this needs to be implemented now in gradle
>> too.
>> On Sat, Oct 6, 2018 at 5:55 PM Romain Manni-Bucau 
>> wrote:
>> >
>> > @Reuven: bytebuddy by itself no but the way beam tries to inject the
>> proxy class is. There are other strategies you can use in bytebuddy which
>> work.
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >
>> >
>> > Le sam. 6 oct. 2018 à 17:51, Reuven Lax  a écrit :
>> >>
>> >> Romain, do you have any more details on the ByteBuddy incompatibility?
>> Is ByteBuddy incompatible with the Java 11 JRE, or just with new language
>> features?
>> >>
>> >> On Fri, Oct 5, 2018 at 10:20 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>> >>>
>> >>> Hi Arif,
>> >>>
>> >>> AFAIK bytebuddy code is not java 11 friendly otherwise it runs (but
>> it means your pipeline is very very simple since it does not have a dofn
>> ;)) if your engine supports it. Also note that the modules not being named
>> you can have to use some weird import names or even unstable ones if you
>> want to use modules (but there is no real reason to do that yet in java).
>> >>>
>> >>> Romain Manni-Bucau
>> >>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >>>
>> >>>
>> >>> Le ven. 5 oct. 2018 à 19:10, Arif Kasim  a
>> écrit :
>> 
>>  Hello,
>>  What's the status of java version > 8 support for beam? Thanks.
>> 
>>  -Arif.
>>
>


Re: Beam Samza Runner status update

2018-10-10 Thread Kenneth Knowles
Welcome, Hai!

On Wed, Oct 10, 2018 at 3:46 PM Hai Lu  wrote:

> Hi, all
>
> This is Hai from LinkedIn. As Xinyu mentioned, I have been working on
> portable API for Samza runner and made some solid progress. It's been a
> very smooth process (although not effortless for sure) and I'm really
> grateful for the great platform that you all have built. I'm very
> impressed. Bravo!
>
> Excited to work with everyone on Beam. Do expect more questions from me
> down the road.
>
> Thanks,
> Hai
>
> On Wed, Oct 10, 2018 at 12:36 PM Kenneth Knowles  wrote:
>
>> Clarification: Thomas Groh wrote the fuser, not me!
>>
>> Thanks for the sharing all this. Really cool.
>>
>> Kenn
>>
>> On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:
>>
>>> Thanks for sharing! it's so exciting to hear that Beam is being used on
>>> Samza in production @LinkedIn! Your feedback will be helpful to Beam
>>> community!
>>>
>>> Besides, Beam supports SQL right now and hopefully Beam community could
>>> also receive feedback on BeamSQL
>>>  in the
>>> future.
>>>
>>> -Rui
>>>
>>> On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Thanks for sharing and congrats for this great work !

 Regards
 JB
 Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
 target=_blank>xinyuliu...@gmail.com> a écrit:
>
> Hi, All,
>
> It's been over four months since we added the Samza Runner to Beam,
> and we've been making a lot of progress after that. Here I would like to
> update your guys and share some really good news happening here at 
> LinkedIn:
>
> 1) First Beam job in production @LInkedIn!
> After a few rounds of testing and benchmarking, we finally rolled out
> our first Beam job here! The job uses quite a few features, such as event
> time, fixed/session windowing, early triggering, and stateful processing.
> Our first customer is very happy and they highly appraise the easy-to-use
> Beam API as well as powerful processing model. Due to the limited 
> resources
> here, we put our full trust in the work you guys are doing, and we didn't
> run into any surprises. We see extremely attention to details as well as
> non-compromise in any user experience everywhere in the code base. We 
> would
> like to thank everyone in the Beam community to contribute to such an
> amazing framework!
>
> 2) A portable Samza Runner prototype
> We are also starting the work in making Samza Runner portable. So far
> we just got the python word count example working using portable Samza
> Runner. Please look out for the PR for this very soon :). Again, this work
> is not possible without the great Beam portability framework, and the
> developers like Luke and Ahmet, just to name a few, behind it. The
> ReferenceRunner has been extremely useful to us to figure out what's 
> needed
> and how it works. Kudos to Thomas Groh, Ben Sidhom and all the others who
> makes this available to us. And to Kenn, your fuse work rocks.
>
> 3) More contributors in Samza Runner
> The runner has been Chris and my personal project for a while and now
> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
> contribute. Hai has been focusing on the portability work as mentioned in
> #2, and Boris will work mostly on supporting our use cases. We will send
> more emails discussing our use cases, like the "Update state after firing"
> email I sent out earlier.
>
> Finally, a shout-out to our very own Chris Pettitt. Without you, none
> of the above won't happen!
>
> Thanks,
> Xinyu
>



Re: Beam Samza Runner status update

2018-10-10 Thread Hai Lu
Hi, all

This is Hai from LinkedIn. As Xinyu mentioned, I have been working on
portable API for Samza runner and made some solid progress. It's been a
very smooth process (although not effortless for sure) and I'm really
grateful for the great platform that you all have built. I'm very
impressed. Bravo!

Excited to work with everyone on Beam. Do expect more questions from me
down the road.

Thanks,
Hai

On Wed, Oct 10, 2018 at 12:36 PM Kenneth Knowles  wrote:

> Clarification: Thomas Groh wrote the fuser, not me!
>
> Thanks for the sharing all this. Really cool.
>
> Kenn
>
> On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:
>
>> Thanks for sharing! it's so exciting to hear that Beam is being used on
>> Samza in production @LinkedIn! Your feedback will be helpful to Beam
>> community!
>>
>> Besides, Beam supports SQL right now and hopefully Beam community could
>> also receive feedback on BeamSQL
>>  in the future.
>>
>> -Rui
>>
>> On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Thanks for sharing and congrats for this great work !
>>>
>>> Regards
>>> JB
>>> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
>>> target=_blank>xinyuliu...@gmail.com> a écrit:

 Hi, All,

 It's been over four months since we added the Samza Runner to Beam, and
 we've been making a lot of progress after that. Here I would like to update
 your guys and share some really good news happening here at LinkedIn:

 1) First Beam job in production @LInkedIn!
 After a few rounds of testing and benchmarking, we finally rolled out
 our first Beam job here! The job uses quite a few features, such as event
 time, fixed/session windowing, early triggering, and stateful processing.
 Our first customer is very happy and they highly appraise the easy-to-use
 Beam API as well as powerful processing model. Due to the limited resources
 here, we put our full trust in the work you guys are doing, and we didn't
 run into any surprises. We see extremely attention to details as well as
 non-compromise in any user experience everywhere in the code base. We would
 like to thank everyone in the Beam community to contribute to such an
 amazing framework!

 2) A portable Samza Runner prototype
 We are also starting the work in making Samza Runner portable. So far
 we just got the python word count example working using portable Samza
 Runner. Please look out for the PR for this very soon :). Again, this work
 is not possible without the great Beam portability framework, and the
 developers like Luke and Ahmet, just to name a few, behind it. The
 ReferenceRunner has been extremely useful to us to figure out what's needed
 and how it works. Kudos to Thomas Groh, Ben Sidhom and all the others who
 makes this available to us. And to Kenn, your fuse work rocks.

 3) More contributors in Samza Runner
 The runner has been Chris and my personal project for a while and now
 it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
 contribute. Hai has been focusing on the portability work as mentioned in
 #2, and Boris will work mostly on supporting our use cases. We will send
 more emails discussing our use cases, like the "Update state after firing"
 email I sent out earlier.

 Finally, a shout-out to our very own Chris Pettitt. Without you, none
 of the above won't happen!

 Thanks,
 Xinyu

>>>


Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread David Morávek
Anton:
All of the points are be correct, with one minor exception. We are
currently moving our production workloads from Euphoria
 to Beam (using the DSL), but we are
hitting scalability issues of the current spark runner, so it is not
technically used in production yet. Everything behaves correctly in the
staging environment, where runner can handle the workload.

Kenn:
here is the the IP Clearance document
https://gist.github.com/dmvk/80acb0579f196e18c02a4e280978d445

Thanks,
David

On Wed, Oct 10, 2018 at 11:30 PM Kenneth Knowles  wrote:

> I just glanced through it to make sure things are in the right place and
> build set up right and that all LGTM.
>
> We need to file the IP Clearance to finish the process that Davor started.
> Please fill the XML template at
> http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/ip-clearance-template.xml
> then I will review and file it in SVN.
>
> Kenn
>
> On Wed, Oct 10, 2018 at 2:15 PM Anton Kedin  wrote:
>
>> I think the code looks good and we should probably just merge it (unless
>> there are other blockers, e.g. formal approvals), considering:
>>  - it has been reviewed;
>>  - it is tested and used in production;
>>  - it was discussed on the list and there were no objections to having it
>> as part of Beam;
>>  - it is a standalone extension and doesn't interfere with Beam Java SDK,
>> if I didn't miss anything;
>>  - it has people working on it and supporting it;
>>
>> All other issues can probably be sorted out in normal Beam process.
>>
>> Regards,
>> Anton
>>
>> On Wed, Oct 10, 2018 at 5:57 AM David Morávek 
>> wrote:
>>
>>> Hello Max,
>>>
>>> It would be great if you can do more of a "general" review, the code
>>> base is fairly large, well tested and it was already reviewed internally by
>>> several people.
>>>
>>> We would like to have the overall approach and design decisions
>>> validated by the community and get some inputs on what could be improved
>>> and if we are headed the right direction.
>>>
>>> Thanks,
>>> David
>>>
>>> On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels 
>>> wrote:
>>>
 That is a huge PR! :) Euphoria looks great. Especially for people
 coming
 from Flink/Spark. I'll check out the documentation.

 Do you have any specific code parts which you want to have reviewed?

 Thanks,
 Max

 On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
 > Hi,
 >
 > Thanks for all the work you are doing on this DSL !
 >
 > I tried to follow the features branch for a while. I'm still committed
 > to  move forward on that front,  but more reviewers would be great.
 >
 > Regards
 > JB
 >
 > On 10/10/2018 10:26, Plajt, Vaclav wrote:
 >> Hello Beam devs,
 >> we finished our main goals in development of Euphoria DSL. It is
 Easy to
 >> use Java 8 API build on top of the Beam's Java SDK. API provides a
 >> high-level abstraction of data transformations, with focus on the
 Java 8
 >> language features (e.g. lambdas and streams). It is fully
 inter-operable
 >> with existing Beam SDK and convertible back and forth. It allows fast
 >> prototyping through use of (optional) Kryo based coders and can be
 >> seamlessly integrated into existing Beam Pipelines.
 >>
 >> Now we believe that it is the time to start discussion about it with
 the
 >> community. Which will hopefully lead to vote about adapting it into
 >> Apache Beam project. Most of main ideas and development goals were
 >> presented in Beam Summit in London [1].
 >>
 >> We are looking for reviewers within the community. Please start with
 >> documentation [2] or design document [3]. Our contribution is
 divided to
 >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
 >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code
 base
 >> remains untouched.
 >> All the checks in MR [5] are passing with exception of "Website
 >> PreCommit". Which seems to be broken, little help here would be
 appreciated.
 >>
 >> Thank you
 >> We are looking forward for your feedback.
 >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
 >>
 >> Resources:
 >> [1] Beam Summit London presentation:
 >>
 https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
 >> [2] Documentation:
 >>
 https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
 >> [3] Design Document: https://s.apache.org/beam-euphoria
 >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
 >> [5] Pull Request: https://github.com/apache/beam/pull/6601
 >> [6] Original proposal:
 >>
 

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Kenneth Knowles
I just glanced through it to make sure things are in the right place and
build set up right and that all LGTM.

We need to file the IP Clearance to finish the process that Davor started.
Please fill the XML template at
http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/ip-clearance-template.xml
then I will review and file it in SVN.

Kenn

On Wed, Oct 10, 2018 at 2:15 PM Anton Kedin  wrote:

> I think the code looks good and we should probably just merge it (unless
> there are other blockers, e.g. formal approvals), considering:
>  - it has been reviewed;
>  - it is tested and used in production;
>  - it was discussed on the list and there were no objections to having it
> as part of Beam;
>  - it is a standalone extension and doesn't interfere with Beam Java SDK,
> if I didn't miss anything;
>  - it has people working on it and supporting it;
>
> All other issues can probably be sorted out in normal Beam process.
>
> Regards,
> Anton
>
> On Wed, Oct 10, 2018 at 5:57 AM David Morávek 
> wrote:
>
>> Hello Max,
>>
>> It would be great if you can do more of a "general" review, the code base
>> is fairly large, well tested and it was already reviewed internally by
>> several people.
>>
>> We would like to have the overall approach and design decisions validated
>> by the community and get some inputs on what could be improved and if we
>> are headed the right direction.
>>
>> Thanks,
>> David
>>
>> On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels 
>> wrote:
>>
>>> That is a huge PR! :) Euphoria looks great. Especially for people coming
>>> from Flink/Spark. I'll check out the documentation.
>>>
>>> Do you have any specific code parts which you want to have reviewed?
>>>
>>> Thanks,
>>> Max
>>>
>>> On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
>>> > Hi,
>>> >
>>> > Thanks for all the work you are doing on this DSL !
>>> >
>>> > I tried to follow the features branch for a while. I'm still committed
>>> > to  move forward on that front,  but more reviewers would be great.
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 10/10/2018 10:26, Plajt, Vaclav wrote:
>>> >> Hello Beam devs,
>>> >> we finished our main goals in development of Euphoria DSL. It is Easy
>>> to
>>> >> use Java 8 API build on top of the Beam's Java SDK. API provides a
>>> >> high-level abstraction of data transformations, with focus on the
>>> Java 8
>>> >> language features (e.g. lambdas and streams). It is fully
>>> inter-operable
>>> >> with existing Beam SDK and convertible back and forth. It allows fast
>>> >> prototyping through use of (optional) Kryo based coders and can be
>>> >> seamlessly integrated into existing Beam Pipelines.
>>> >>
>>> >> Now we believe that it is the time to start discussion about it with
>>> the
>>> >> community. Which will hopefully lead to vote about adapting it into
>>> >> Apache Beam project. Most of main ideas and development goals were
>>> >> presented in Beam Summit in London [1].
>>> >>
>>> >> We are looking for reviewers within the community. Please start with
>>> >> documentation [2] or design document [3]. Our contribution is divided
>>> to
>>> >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
>>> >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code
>>> base
>>> >> remains untouched.
>>> >> All the checks in MR [5] are passing with exception of "Website
>>> >> PreCommit". Which seems to be broken, little help here would be
>>> appreciated.
>>> >>
>>> >> Thank you
>>> >> We are looking forward for your feedback.
>>> >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
>>> >>
>>> >> Resources:
>>> >> [1] Beam Summit London presentation:
>>> >>
>>> https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
>>> >> [2] Documentation:
>>> >>
>>> https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
>>> >> [3] Design Document: https://s.apache.org/beam-euphoria
>>> >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
>>> >> [5] Pull Request: https://github.com/apache/beam/pull/6601
>>> >> [6] Original proposal:
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e
>>> >>
>>> >>
>>> >>
>>> >> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
>>> >> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
>>> >> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
>>> >> občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
>>> >> nás a jak vystupuje za společnost a kdo může co a jak podepsat
>>> naleznete
>>> >> zde 
>>> >>
>>> >> You should know that this e-mail and its attachments are confidential.
>>> >> If we are negotiating on the conclusion of a transaction, we reserve
>>> the
>>> >> right to terminate the negotiations at any time. For fans 

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Tim Robertson
Thank you JB for starting this discussion.

Others comment on many of these points far better than I can, but my
experience is similar to JB.

1. IDEA integration (and laptop slowing like crazy) being the biggest
contributor to my feeling of being unproductive
2. Not knowing the correct way to modify the build scripts which I put down
to my own limitations

It seems we also need to help build Gradle expertise in our community, so
> that those that are motivated are empowered to contribute.


Nicely phrased. +1



On Wed, Oct 10, 2018 at 7:15 PM Scott Wegner  wrote:

> > Perhaps we should go through and prioritize (and add missing items to)
> BEAM-4045
>
> +1. It's hard to know where to start when there's such a laundry list of
> tasks. If you're having build issues, will you make sure it is represented
> in BEAM-4045, and "Vote" for the issues that you believe are the highest
> priority?
>
> I agree that the Gradle build is far from perfect (my top gripes are IDE
> integration and parallel/incremental build support). I believe that we're
> capable of making our build great, and continuing our investment in Gradle
> would be a shorter path than changing course again. Remember that our Maven
> build also had it's share of issues, which is why we as a community voted
> to replace it [1][2].
>
> It seems we also need to help build Gradle expertise in our community, so
> that those that are motivated are empowered to contribute. Does anybody
> have a good "Getting Started with Gradle" guide they recommend? Perhaps we
> could also link to it from the website/wiki.
>
> [1]
> https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/bd399ecb17cd211be7c6089b562c09ba9116649c9eabe3b609606a3b@%3Cdev.beam.apache.org%3E
>
> On Wed, Oct 10, 2018 at 2:40 AM Robert Bradshaw 
> wrote:
>
>> Some rough stats (because I was curious): The gradle files have been
>> edited by ~79 unique contributors over 696 distinct commits, whereas the
>> maven ones were edited (over a longer time period) by ~130 unique
>> contributors over 1389 commits [1]. This doesn't capture how much effort
>> was put into these edits, but neither is restricted to a small set of
>> experts.
>>
>> Regarding "friendly for other languages" I don't think either is
>> necessarily easy to learn, but my impression is that the maven learning
>> curve shallower for those already firmly embedded in the Java ecosystem
>> (perhaps due to leveraging existing familiarity, and perhaps some due to
>> the implicit java-centric conventions that maven assumed about your
>> project), whereas with gradle at least I could keep pulling on the string
>> to unwind things to the bottom. The "I just want to build/test X without
>> editing/viewing the build files" seemed more natural with Gradle (e.g. I
>> can easily list all tasks).
>>
>> That being said, I don't think everyone needs to understand the full
>> build system. It's important that there be a critical mass that do (we have
>> that for both, and if we can simplify to improve this that'd be great),
>> it's easy enough to do basic changes (e.g. add a dependency, again I don't
>> think the barrier is sufficiently different for either), and works well out
>> of the box for someone who just wants to look up a command on the website
>> and edit code (the CLI is an improvement with Gradle, but it's clear that
>> (java) IDE support is a significant regression).
>>
>> Personally, I don't know much about IDE configuration (admittedly the
>> larger issue), but one action item I can take on is trying to eliminate the
>> need to do a "git clean" after building certain targets (assuming I can
>> reproduce this).
>>
>> Perhaps we should go through and prioritize (and add missing items to)
>> BEAM-4045
>> https://issues.apache.org/jira/issues/?jql=parent%20%3D%20BEAM-4045%20ORDER%20BY%20priority%20DESC
>> ? There's always a long tail with this kind of thing, and looking at the
>> whole list can be daunting, but putting it in the correct order and
>> knocking off the top N items could possibly go a long way.
>>
>> - Robert
>>
>> [1] The commands I ran were (with and without the uniq)
>>
>> $ find . -name 'build.gradle' | xargs git log | grep Author: | grep -o
>> '[^< ]*@' | sort | uniq | wc
>> $ find . -name 'pom.xml' | xargs git log | grep Author: | grep -o '[^<
>> ]*@' | sort | uniq | wc
>>
>> On Wed, Oct 10, 2018 at 10:31 AM Etienne Chauchot 
>> wrote:
>>
>>> Hi all,
>>> I must admit that I agree on the status especially regarding 2 points:
>>> 1. new contributors obstacles: gradle learning curve might be too long
>>> for spare-time contributors, also complex scripted build takes time to
>>> understand comparing to self-descriptive one.
>>> 2. IDE integration kind of slows down development.
>>>
>>> Now, regarding how we improve the situation, I think we need to discuss
>>> and identify tasks and tackle them all together even if 

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Anton Kedin
I think the code looks good and we should probably just merge it (unless
there are other blockers, e.g. formal approvals), considering:
 - it has been reviewed;
 - it is tested and used in production;
 - it was discussed on the list and there were no objections to having it
as part of Beam;
 - it is a standalone extension and doesn't interfere with Beam Java SDK,
if I didn't miss anything;
 - it has people working on it and supporting it;

All other issues can probably be sorted out in normal Beam process.

Regards,
Anton

On Wed, Oct 10, 2018 at 5:57 AM David Morávek 
wrote:

> Hello Max,
>
> It would be great if you can do more of a "general" review, the code base
> is fairly large, well tested and it was already reviewed internally by
> several people.
>
> We would like to have the overall approach and design decisions validated
> by the community and get some inputs on what could be improved and if we
> are headed the right direction.
>
> Thanks,
> David
>
> On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels  wrote:
>
>> That is a huge PR! :) Euphoria looks great. Especially for people coming
>> from Flink/Spark. I'll check out the documentation.
>>
>> Do you have any specific code parts which you want to have reviewed?
>>
>> Thanks,
>> Max
>>
>> On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
>> > Hi,
>> >
>> > Thanks for all the work you are doing on this DSL !
>> >
>> > I tried to follow the features branch for a while. I'm still committed
>> > to  move forward on that front,  but more reviewers would be great.
>> >
>> > Regards
>> > JB
>> >
>> > On 10/10/2018 10:26, Plajt, Vaclav wrote:
>> >> Hello Beam devs,
>> >> we finished our main goals in development of Euphoria DSL. It is Easy
>> to
>> >> use Java 8 API build on top of the Beam's Java SDK. API provides a
>> >> high-level abstraction of data transformations, with focus on the Java
>> 8
>> >> language features (e.g. lambdas and streams). It is fully
>> inter-operable
>> >> with existing Beam SDK and convertible back and forth. It allows fast
>> >> prototyping through use of (optional) Kryo based coders and can be
>> >> seamlessly integrated into existing Beam Pipelines.
>> >>
>> >> Now we believe that it is the time to start discussion about it with
>> the
>> >> community. Which will hopefully lead to vote about adapting it into
>> >> Apache Beam project. Most of main ideas and development goals were
>> >> presented in Beam Summit in London [1].
>> >>
>> >> We are looking for reviewers within the community. Please start with
>> >> documentation [2] or design document [3]. Our contribution is divided
>> to
>> >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
>> >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code base
>> >> remains untouched.
>> >> All the checks in MR [5] are passing with exception of "Website
>> >> PreCommit". Which seems to be broken, little help here would be
>> appreciated.
>> >>
>> >> Thank you
>> >> We are looking forward for your feedback.
>> >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
>> >>
>> >> Resources:
>> >> [1] Beam Summit London presentation:
>> >>
>> https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
>> >> [2] Documentation:
>> >>
>> https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
>> >> [3] Design Document: https://s.apache.org/beam-euphoria
>> >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
>> >> [5] Pull Request: https://github.com/apache/beam/pull/6601
>> >> [6] Original proposal:
>> >>
>> http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e
>> >>
>> >>
>> >>
>> >> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
>> >> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
>> >> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
>> >> občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
>> >> nás a jak vystupuje za společnost a kdo může co a jak podepsat
>> naleznete
>> >> zde 
>> >>
>> >> You should know that this e-mail and its attachments are confidential.
>> >> If we are negotiating on the conclusion of a transaction, we reserve
>> the
>> >> right to terminate the negotiations at any time. For fans of
>> legalese—we
>> >> hereby exclude the provisions of the Civil Code on pre-contractual
>> >> liability. The rules about who and how may act for the company and what
>> >> are the signing procedures can be found here
>> >> .
>> >
>>
>


Re: Fwd: Slack invitation

2018-10-10 Thread Filip Popić
I got it, thank you!

On Wed, 10 Oct 2018 at 16:17, Jean-Baptiste Onofré  wrote:

> You didn't receive it ?
>
> Let me try another time.
>
> Regards
> JB
> Le 10 oct. 2018, à 17:15, "Filip Popić"  a écrit:
>>
>> Any news regarding invitation?
>>
>> On Mon, 8 Oct 2018 at 17:24, Jean-Baptiste Onofré < j...@nanthrax.net>
>> wrote:
>>
>>> Ok I will send it to you as well.
>>>
>>> Regards
>>> JB
>>> Le 8 oct. 2018, à 18:23, Emmanuel Bastien < o...@ebastien.name> a écrit:

 Hello,
 I would like to join the Beam Slack channel. Could someone send me an
 invitation?
 Thanks in advance!
 Emmanuel




Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-10 Thread Ahmet Altay
Given the number of open issues, I will re-cut the release branch once the
blocking issues are resolved. Don't worry about cherry picking changes to
directly to the release branch for now.

I will continue to update this thread.

On Wed, Oct 10, 2018 at 12:12 PM, Niel Markwick  wrote:

> The 3 spannerio issues (5445, 3516, 4796) are waiting for one last LGTM
> before the PRs can be merged, but are otherwise ready for 2.8...


Please work with the reviewers to get them in. I moved those issues to
2.9.0 already.


>
> On Wed, 10 Oct 2018, 19:51 Ahmet Altay,  wrote:
>
>> Thank you JB.
>>
>> It turns out there are 2 more blocker issues. I will look at them now
>> first. (So, I am not rushing towards cutting RC1 yet.)
>>
>> On Wed, Oct 10, 2018 at 11:42 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hey
>>>
>>> Etienne should do a new pass soon. I do my best to cherry pick
>>> RabbitMQIO.
>>>
>>> Thanks
>>> Regards
>>> JB
>>> Le 10 oct. 2018, à 21:25, Ahmet Altay  a écrit:

 Update:

 I started cutting the branch. There are 2 open issues:
 - RabbitMQIO - JB, if you plan to complete this soon I can cherry pick
 to the branch.
 - One new issue related to release process changes with respect to
 beam-site deprecation.

 On Tue, Oct 9, 2018 at 11:38 AM, Jean-Baptiste Onofré 
 wrote:

> Ok. Gonna move forward on RabbitMQIO asap.
>
> Thanks
> Regards
> JB
> Le 9 oct. 2018, à 21:00, Ahmet Altay  a écrit:
>>
>> Hi all,
>>
>> Reminder, I will cut the release branch tomorrow. If you have not
>> done so please take a look at the 2.8.0 issues assigned to you [1].
>>
>> Thank you!
>> Ahmet
>>
>> [1] https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
>> 20fixVersion%20%3D%202.8.0%20ORDER%20BY%20priority%
>> 20DESC%2C%20updated%20DESC
>>
>> On Thu, Oct 4, 2018 at 9:27 AM, Ahmet Altay  wrote:
>>
>>> Thank you all for the feedback. I will continue with 2.8.0 as a
>>> regular release and separate the LTS discussion to a new thread.
>>>
>>> On Thu, Oct 4, 2018 at 7:58 AM, Thomas Weise  wrote:
>>>
 Given the feedback so far, we should probably decouple LTS and
 2.8.0 discussions. In case both converge before 10/10 then fine, but 
 not
 necessary. I also agree that we should not jump the gun on LTS and 
 minimum
 72 hours feedback window for the topic looks appropriate.

 The issues raised by Tim look like blockers and unless we are
 confident that they can be addressed as a patch release may warrant to
 defer LTS? Can we start to tag such JIRAs with an LTS label?

 On the other hand, I think we could allow for a bit of
 experimentation error for the first LTS attempt and feed
 guidelines/policies from learnings/feedback.

 Dependency updates for LTS: I don't think we should block LTS
 because there is a newer version of a dependency out there or we should
 rush updates. If we prioritize stability, then the latest usually 
 isn't the
 best. In the case of Flink, 1.5.x is probably what most users have at 
 this
 time and it has seen 4 patch releases. If Flink community continues to
 support last two minor (X.Y) versions, then 1.5.x support may drop 
 when 1.7
 comes out, but that does not mean we cannot use it if we were to cut a 
 Beam
 LTS release today. I generally think that LTS needs to focus more on 
 the
 stability of Beam itself.

 Thanks,
 Thomas



 On Thu, Oct 4, 2018 at 6:59 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Regarding LTS release - I agree that we need to have clear view
> what kind of support will be provided for such releases.
>
> Despite of the concerns mentioned before, I have another one about
> API labeled as “@Experimental". I think there are most of IOs, SQL,
> PCollection with Schema, etc, labeled with this annotation.
> According to definition, such API should be considered as unstable
> in terms that it can be changed/removed in next releases.
>
> So, the question is - how “@Experimental” API affects LTS releases
> (if it does)? What kind of support should be provided in this case,
> especially, in case if API continued evolving after LTS has been 
> issued? Do
> we need to provide a guarantee (another annotation, for example) that 
> API
> won’t be changed between two LTS releases?
>
> And one more related question, which probably deserves another
> discussion (or was already discussed) - what is criteria to remove
> 

Re: [DISCUSS] Beam public roadmap

2018-10-10 Thread Romain Manni-Bucau
What about a link in the menu. It should contain a list of features and
estimate date with probable error (like "in 5 months +- 1 months) otherwise
it does not bring much IMHO.

Le mer. 10 oct. 2018 23:32, Kenneth Knowles  a écrit :

> Hi all,
>
> We made an attempt at putting together a sort of roadmap [1] in the past
> and also some wide-ranging threads about what could be on it [2]. and I
> think we should pick it up again. The description I really liked was
> "strategic and user impacting initiatives (ongoing and future) in an easy
> to consume format" [3]. It seems that we had feedback asking for a Roadmap
> at the London summit [4].
>
> I would like to first focus on meta-questions rather than what would be on
> it:
>
>  - What style / format should it have to be most useful for users?
>  - Where should it be presented?
>
> I asked a couple people to try to find the roadmap on the web site, as a
> test, and they didn't really know which tab to click on first, so that's a
> starting problem. They didn't even find Works In Progress [5] after
> clicking Contribute. The level of detail of that list varies widely.
>
> I'd also love to see hypothetical formats for it, to see how to balance
> pithiness with crucial details.
>
> Kenn
>
> [1]
> https://lists.apache.org/thread.html/4e1fffa2fde8e750c6d769bf4335853ad05b360b8bd248ad119cc185@%3Cdev.beam.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
> [3]
> https://lists.apache.org/thread.html/60d0333fd9e2c7be2f55e33b0d145f2908e3fe645c008636c86e1133@%3Cdev.beam.apache.org%3E
> [4]
> https://lists.apache.org/thread.html/aa1306da25029dff12a49ba3ce63f2caf6a5f8ba73eda879c8403f3f@%3Cdev.beam.apache.org%3E
>
> [5] https://beam.apache.org/contribute/#works-in-progress
>


[DISCUSS] Beam public roadmap

2018-10-10 Thread Kenneth Knowles
Hi all,

We made an attempt at putting together a sort of roadmap [1] in the past
and also some wide-ranging threads about what could be on it [2]. and I
think we should pick it up again. The description I really liked was
"strategic and user impacting initiatives (ongoing and future) in an easy
to consume format" [3]. It seems that we had feedback asking for a Roadmap
at the London summit [4].

I would like to first focus on meta-questions rather than what would be on
it:

 - What style / format should it have to be most useful for users?
 - Where should it be presented?

I asked a couple people to try to find the roadmap on the web site, as a
test, and they didn't really know which tab to click on first, so that's a
starting problem. They didn't even find Works In Progress [5] after
clicking Contribute. The level of detail of that list varies widely.

I'd also love to see hypothetical formats for it, to see how to balance
pithiness with crucial details.

Kenn

[1]
https://lists.apache.org/thread.html/4e1fffa2fde8e750c6d769bf4335853ad05b360b8bd248ad119cc185@%3Cdev.beam.apache.org%3E
[2]
https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
[3]
https://lists.apache.org/thread.html/60d0333fd9e2c7be2f55e33b0d145f2908e3fe645c008636c86e1133@%3Cdev.beam.apache.org%3E
[4]
https://lists.apache.org/thread.html/aa1306da25029dff12a49ba3ce63f2caf6a5f8ba73eda879c8403f3f@%3Cdev.beam.apache.org%3E

[5] https://beam.apache.org/contribute/#works-in-progress


Re: Portable Flink runner: Generator source for testing

2018-10-10 Thread Micah Wylde
I've opened a JIRA for adding the generator source (BEAM-5707) and sent out
a very rough PR (https://github.com/apache/beam/pull/6637). Would
appreciate any feedback.

On Mon, Oct 8, 2018 at 9:43 AM, Thomas Weise  wrote:

> The portable runner does not support metrics yet: https://s.apache.org/
> apache-beam-portability-support-table
>
> There is also no JIRA referenced in the table, would be good to
> locate/create it.
>
> On Mon, Oct 8, 2018 at 9:11 AM Łukasz Gajowy 
> wrote:
>
>> Does anyone know what is the status of metrics support for Flink Portable
>> Runner? I think we need them to be used in such tests to at least collect
>> time metric that does not contain cluster warm up time, staging resources
>> time and other things that can disturb the actual run time metric. We
>> probably should use the metrics API in some other places (as described in
>> the above-mentioned proposal).
>>
>>
>>
>> pon., 8 paź 2018 o 12:12 Maximilian Michels  napisał(a):
>>
>>> This is correct. However, the example code is only part of Lyft's code
>>> base. Until timer support is done, we would have to do something similar
>>> in our code base.
>>>
>>> On 08.10.18 02:34, Łukasz Gajowy wrote:
>>> > Hi,
>>> >
>>> > just to clarify, judging from the above snippets: it seems that we are
>>> > able now to run tests that use a native source for data generation and
>>> > use them in this form until the Timers are supported. When Timers are
>>> > there, we should consider switching to the Impulse + PTransform based
>>> > solution (described above) because it's more portable - the current is
>>> > dedicated to Flink only (which still is really cool). Is this correct
>>> or
>>> > am I missing something?
>>> >
>>> > Łukasz
>>> >
>>> > pt., 5 paź 2018 o 14:04 Maximilian Michels >> > > napisał(a):
>>> >
>>> > Thanks for sharing your setup. You're right that we need timers to
>>> > continuously ingest data to the testing pipeline.
>>> >
>>> > Here is the Flink source which generates the data:
>>> > https://github.com/mwylde/beam/commit/
>>> 09c62991773c749bc037cc2b6044896e2d34988a#diff-
>>> b2fc8d680d9c1da86ba23345f3bc83d4R42
>>> >
>>> > On 04.10.18 19:31, Thomas Weise wrote:
>>> >  > FYI here is an example with native generator for portable Flink
>>> > runner:
>>> >  >
>>> >  > https://github.com/mwylde/beam/tree/micah_memory_leak
>>> >  >
>>> > https://github.com/mwylde/beam/blob/22f7099b071e65a76110ecc5beda06
>>> 36ca07e101/sdks/python/apache_beam/examples/streaming_leak.py
>>> >  >
>>> >  > You can use it to run the portable Flink runner in streaming
>>> mode
>>> >  > continuously for testing purposes.
>>> >  >
>>> >  >
>>> >  > On Mon, Oct 1, 2018 at 9:50 AM Thomas Weise >> > 
>>> >  > >> wrote:
>>> >  >
>>> >  >
>>> >  >
>>> >  > On Mon, Oct 1, 2018 at 8:29 AM Maximilian Michels
>>> > mailto:m...@apache.org>
>>> >  > >> wrote:
>>> >  >
>>> >  >  > and then have Flink manage the parallelism for stages
>>> >  > downstream from that?@Pablo Can you clarify what you
>>> mean
>>> > by that?
>>> >  >
>>> >  > Let me paraphrase this just to get a clear
>>> understanding.
>>> > There
>>> >  > are two
>>> >  > approaches to test portable streaming pipelines:
>>> >  >
>>> >  > a) Use an Impulse followed by a test PTransform which
>>> > generates
>>> >  > testing
>>> >  > data. This is similar to how streaming sources work
>>> which
>>> > don't
>>> >  > use the
>>> >  > Read Transform. For basic testing this should work, even
>>> > without
>>> >  > support
>>> >  > for Timers.
>>> >  >
>>> >  >
>>> >  > AFAIK this works for bounded sources and batch mode of the
>>> Flink
>>> >  > runner (staged execution).
>>> >  >
>>> >  > For streaming we need small bundles, we cannot have a Python
>>> > ParDo
>>> >  > block to emit records periodically.
>>> >  >
>>> >  > (With timers, the ParDo wouldn't block but instead schedule
>>> > itself
>>> >  > as needed.)
>>> >  >
>>> >  > b) Introduce a new URN which gets translated to a native
>>> >  > Flink/Spark/xy
>>> >  > testing transform.
>>> >  >
>>> >  > We should go for a) as this will make testing easier
>>> across
>>> >  > portable
>>> >  > runners. We previously discussed native transforms will
>>> be an
>>> >  > option in
>>> >  > Beam, but it would be preferable to leave them out of
>>> testing
>>> >  > for now.
>>> >  >
>>> >  > Thanks,
>>> >  > Max
>>> >  >
>>> 

Re: Beam Samza Runner status update

2018-10-10 Thread Kenneth Knowles
Clarification: Thomas Groh wrote the fuser, not me!

Thanks for the sharing all this. Really cool.

Kenn

On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:

> Thanks for sharing! it's so exciting to hear that Beam is being used on
> Samza in production @LinkedIn! Your feedback will be helpful to Beam
> community!
>
> Besides, Beam supports SQL right now and hopefully Beam community could
> also receive feedback on BeamSQL
>  in the future.
>
> -Rui
>
> On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
> wrote:
>
>> Thanks for sharing and congrats for this great work !
>>
>> Regards
>> JB
>> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
>> target=_blank>xinyuliu...@gmail.com> a écrit:
>>>
>>> Hi, All,
>>>
>>> It's been over four months since we added the Samza Runner to Beam, and
>>> we've been making a lot of progress after that. Here I would like to update
>>> your guys and share some really good news happening here at LinkedIn:
>>>
>>> 1) First Beam job in production @LInkedIn!
>>> After a few rounds of testing and benchmarking, we finally rolled out
>>> our first Beam job here! The job uses quite a few features, such as event
>>> time, fixed/session windowing, early triggering, and stateful processing.
>>> Our first customer is very happy and they highly appraise the easy-to-use
>>> Beam API as well as powerful processing model. Due to the limited resources
>>> here, we put our full trust in the work you guys are doing, and we didn't
>>> run into any surprises. We see extremely attention to details as well as
>>> non-compromise in any user experience everywhere in the code base. We would
>>> like to thank everyone in the Beam community to contribute to such an
>>> amazing framework!
>>>
>>> 2) A portable Samza Runner prototype
>>> We are also starting the work in making Samza Runner portable. So far we
>>> just got the python word count example working using portable Samza Runner.
>>> Please look out for the PR for this very soon :). Again, this work is not
>>> possible without the great Beam portability framework, and the developers
>>> like Luke and Ahmet, just to name a few, behind it. The ReferenceRunner has
>>> been extremely useful to us to figure out what's needed and how it works.
>>> Kudos to Thomas Groh, Ben Sidhom and all the others who makes this
>>> available to us. And to Kenn, your fuse work rocks.
>>>
>>> 3) More contributors in Samza Runner
>>> The runner has been Chris and my personal project for a while and now
>>> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
>>> contribute. Hai has been focusing on the portability work as mentioned in
>>> #2, and Boris will work mostly on supporting our use cases. We will send
>>> more emails discussing our use cases, like the "Update state after firing"
>>> email I sent out earlier.
>>>
>>> Finally, a shout-out to our very own Chris Pettitt. Without you, none of
>>> the above won't happen!
>>>
>>> Thanks,
>>> Xinyu
>>>
>>


Re: Splitting the repo

2018-10-10 Thread Kenneth Knowles
I think Robert's initial question needs to be focused on a particular split.

I agree that a "single project spanning multiple repos" does not make
sense. But separate projects in separate repos is pretty widely used :-). The
point of separate repos IMO would be to empower (and force) them to act as
separate projects.

Every monorepo I have worked in has struggled with modularity problems. But
conversely, a project with poor modularity can thrive in a monorepo because
it is feasible to make changes across all the bits that are tightly
coupled. Because it is a subtext whenever a Google employee talks about
monorepos, I want to call out that Google's uniquely massive and
interesting monorepo requires a tremendous amount of bespoke infrastructure
to manage coupling, testing, ownership, etc*. It is not analogous to a
large repo on GitHub.

So... which pieces are "not separate enough" and why and how do we want to
make them separate?

I can think of some candidates that could benefit from some kind of
"separateness":

 - IOs or collections of IOs: separate release cadence, only build on
stable SDK releases (potential for diamond dep problems)
 - Portability protos: forces them to be highly stable and forces runners
to adapt to major iterations
 - Language SDKs: easier to build a community of devs with a clearly
familiar project structure and toolchain

Maybe the kinds of separation that folks want does not have to be a
separate repo, as mentioned. But it is still important that most
infrastructure and UI is geared towards a certain scale of project (not
just repo): issue tracking, pull request management, mailing lists,
ownership, selective test execution, triaging test failures, etc.

At this point, I see strong arguments in both directions and think that a
specific proposal of a specific split at the right time deserves an
individualized discussion.

Kenn

*Other issues include governance and effectiveness for shipping
user-friendly libraries




On Wed, Oct 10, 2018 at 11:12 AM Ankur Goenka  wrote:

> Hi,
>
> I think the subtext here is that development is hard in general. I agree
> to it. And a major cause of it is diversity of languages, complexity of the
> project and legacy code.
> To alleviate language related issues, we are trying to have modular code
> which we already have to a certain extent.
> On the other hand tooling is still evolving and needs improvement. I also
> feel that tooling is a moving target and its good to keep on reevaluating
> it.
> Tooling is a problem for everyone (the whole community) and we are
> actively trying to solve it. Gradle is a big step towards it.
> I personally contribute to multiple languages. Many of the PR have changes
> spanning across languages and have to be merged as a whole. I personally
> feel that having a unified build system makes it easier to do the checks
> and make sure things work.
> Even after gradle, I am still able to setup intellij for Java, Pycharm for
> Python and GoLand for Go as I would have done earlier (before gradle). I am
> also able to run "python setup.py sdist" as I was able to do before gradle.
> Gradle is also acting as the top level task manager and most of the python
> tasks are just plain shell commands stitched together.
> The only real problem that I face in my setup is the vendored java jars
> which only impact java development.
> Probably documenting separate environment specific setup for each language
> is sufficient to address the issue.
>
> I also agree with Max that splitting the repo will cause more pain than
> gain.
>
> ~Ankur
>
>
>
> On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau 
> wrote:
>
>>
>>
>>
>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels  a
>> écrit :
>>
>>> Hi,
>>>
>>> I agree that splitting up Beam into separate repositories would cause
>>> more pain than gain.
>>>
>>> To a large degree we already have independent modules, e.g. runners/* or
>>> sdks/*. Although this is not the case for the core. It would be
>>> desirable to break it up further.
>>>
>>
>> Think this part is ok for everyone.
>>
>>
>>>
>>>  > possibly even with their own build system (unified only through a
>>>  > top-level "build everything" script that descends into each subdir and
>>>  > runs the appropriate command).
>>>
>>> This is almost what we have. Yes, there are some dependencies on the
>>> Beam Gradle Plugin, but even if we had completely independent build
>>> directories, you'd still want to have a shared config/tasks across the
>>> projects (which might bring you back to a setup similar to what we have).
>>>
>>> One of the pain points seems to be the portability which "polluted" some
>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>> that could have been solved with an abstraction. But the lack of
>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>
>> Not at all. Assume we have a full build which is doing portability then 3
>> concurrent builds (go, python, java)

Re: Does anyone have a strong intelliJ setup?

2018-10-10 Thread Rui Wang
I left my tips to run *Java* unit tests in Intellij (work for me all the
time). I assumed that people mostly use intellij for Java development.

If there are some cases when people use Intellij to develop other languages
(maybe because of the power of plugins?), we might need to create separate
sessions for those cases.

-Rui

On Wed, Oct 10, 2018 at 11:46 AM Scott Wegner  wrote:

> Last week I migrated all previous content from the website into wiki pages
> for IntelliJ [1] and Eclipse [2] (thanks Thomas Weise for the pointers).
>
> The next step is to incorporate all the tips that people have mentioned
> here and fill in any other gaps we have. Here's how I'd like to get started:
>
> 1) Focus on IntelliJ first. I don't use Eclipse and I don't have the
> expertise to make this experience great. I'd be glad if somebody else
> picked this up.
> 2) Re-organize the wiki page into a set of high-level developer tasks that
> we support; things like "Setting up IntelliJ IDE from scratch", "Performing
> a full build", "Building a testing a single module", "Running a single unit
> test", "Running an IT for a particular runner", "Recovering from project
> corruption", "Common errors"
> 3) Work on one section at a time, filling in step-by-step instructions
> that are prescriptive and easy to validate.
>
> And I'd love some help! Here's what you could do to help:
>
> * Respond to this email with any high-level "developer scenarios" that
> I've forgotten above. Things that you should be able to do in an IDE and we
> should document for all contributors.
> * Add your tips and work-arounds; I'll be collecting as much as I can in
> this working doc before organizing it into the wiki:
> https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit#
> * Write wiki documentation for one of the scenarios listed above. Let us
> know which you'll be working on so we don't duplicate work.
>
> [1] https://cwiki.apache.org/confluence/display/BEAM/IntelliJ+Tips
> [2] https://cwiki.apache.org/confluence/display/BEAM/Eclipse+Tips
>
> On Thu, Oct 4, 2018 at 7:43 AM Maximilian Michels  wrote:
>
>> Yes, you need to manually add the vendor JAR to the modules where it is
>> missing. AFAIK there is no automatic solution.
>>
>> On 04.10.18 16:34, Thomas Weise wrote:
>> > Was anyone successful making Intellij understand the dependency
>> > vendoring and not display as unresolvable symbols?
>> >
>> >
>> > On Thu, Oct 4, 2018 at 6:13 AM Maximilian Michels > > > wrote:
>> >
>> > That's fine, I think we have accepted the fact that IntelliJ only
>> works
>> > with delegating the build to Gradle instead of using its built-in
>> > Gradle
>> > support. That comes with a bunch of drawbacks, i.e. slow build/test
>> > execution.
>> >
>> >  > 4. the current gradle setup still requires some knowledge about
>> > the setup (like for validates runners which are not "just tests")
>> > and there is no trivial way to make the IDE aware of it until you
>> > generate the IDE files (.idea
>> > The ValidatesRunner tests are not part of the IntelliJ setup. These
>> are
>> > additional integration test which are part of Gradle but can't be
>> > programmatically called from within IntelliJ.
>> >
>> > On 04.10.18 14:59, Romain Manni-Bucau wrote:
>> >  >
>> >  >
>> >  >
>> >  > Le jeu. 4 oct. 2018 à 14:53, Maximilian Michels > > 
>> >  > >> a écrit :
>> >  >
>> >  >  > We have some hints in the gradle files that used to allow
>> a
>> >  > smooth import with no extra steps*. Have the hints gotten
>> out of
>> >  > date or are there new hints we can put in that might help?
>> >  >
>> >  > If you're referring to the `gradle idea` task which generates
>> > IntelliJ
>> >  > IPR files, that doesn't work anymore. The build is way too
>> > involved for
>> >  > that too work. We've since removed this from the contribute
>> > guide.
>> >  >
>> >  > There is still the IntelliJ tips page which describes a
>> different
>> >  > (non-working) procedure. In the end, you have to fiddle with
>> the
>> >  > project
>> >  > setup, i.e. adding the vendor JAR to the classpath where
>> > necessary. But
>> >  > it breaks as soon as your refresh the Gradle project.
>> >  >
>> >  > Romain, can you really get it to work out of the box with
>> > your method?
>> >  > If so, I'd like to contact you for information to update the
>> >  > IntelliJ page.
>> >  >
>> >  >
>> >  > Yep, worked at least last time I tried. I didn't played much with
>> > it but
>> >  > I assume it is reproducible. Feel free to ping me on slack.
>> >  >
>> >  >
>> >  > Note, this is not the first conversation, so we should really
>> > fix the
>> >

Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-10 Thread Ahmet Altay
Thank you JB.

It turns out there are 2 more blocker issues. I will look at them now
first. (So, I am not rushing towards cutting RC1 yet.)

On Wed, Oct 10, 2018 at 11:42 AM, Jean-Baptiste Onofré 
wrote:

> Hey
>
> Etienne should do a new pass soon. I do my best to cherry pick RabbitMQIO.
>
> Thanks
> Regards
> JB
> Le 10 oct. 2018, à 21:25, Ahmet Altay  a écrit:
>>
>> Update:
>>
>> I started cutting the branch. There are 2 open issues:
>> - RabbitMQIO - JB, if you plan to complete this soon I can cherry pick to
>> the branch.
>> - One new issue related to release process changes with respect to
>> beam-site deprecation.
>>
>> On Tue, Oct 9, 2018 at 11:38 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Ok. Gonna move forward on RabbitMQIO asap.
>>>
>>> Thanks
>>> Regards
>>> JB
>>> Le 9 oct. 2018, à 21:00, Ahmet Altay  a écrit:

 Hi all,

 Reminder, I will cut the release branch tomorrow. If you have not done
 so please take a look at the 2.8.0 issues assigned to you [1].

 Thank you!
 Ahmet

 [1] https://issues.apache.org/jira/issues/?jql=project%20%3D
 %20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVer
 sion%20%3D%202.8.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

 On Thu, Oct 4, 2018 at 9:27 AM, Ahmet Altay  wrote:

> Thank you all for the feedback. I will continue with 2.8.0 as a
> regular release and separate the LTS discussion to a new thread.
>
> On Thu, Oct 4, 2018 at 7:58 AM, Thomas Weise  wrote:
>
>> Given the feedback so far, we should probably decouple LTS and 2.8.0
>> discussions. In case both converge before 10/10 then fine, but not
>> necessary. I also agree that we should not jump the gun on LTS and 
>> minimum
>> 72 hours feedback window for the topic looks appropriate.
>>
>> The issues raised by Tim look like blockers and unless we are
>> confident that they can be addressed as a patch release may warrant to
>> defer LTS? Can we start to tag such JIRAs with an LTS label?
>>
>> On the other hand, I think we could allow for a bit of
>> experimentation error for the first LTS attempt and feed
>> guidelines/policies from learnings/feedback.
>>
>> Dependency updates for LTS: I don't think we should block LTS because
>> there is a newer version of a dependency out there or we should rush
>> updates. If we prioritize stability, then the latest usually isn't the
>> best. In the case of Flink, 1.5.x is probably what most users have at 
>> this
>> time and it has seen 4 patch releases. If Flink community continues to
>> support last two minor (X.Y) versions, then 1.5.x support may drop when 
>> 1.7
>> comes out, but that does not mean we cannot use it if we were to cut a 
>> Beam
>> LTS release today. I generally think that LTS needs to focus more on the
>> stability of Beam itself.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Thu, Oct 4, 2018 at 6:59 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Regarding LTS release - I agree that we need to have clear view what
>>> kind of support will be provided for such releases.
>>>
>>> Despite of the concerns mentioned before, I have another one about
>>> API labeled as “@Experimental". I think there are most of IOs, SQL,
>>> PCollection with Schema, etc, labeled with this annotation.
>>> According to definition, such API should be considered as unstable
>>> in terms that it can be changed/removed in next releases.
>>>
>>> So, the question is - how “@Experimental” API affects LTS releases
>>> (if it does)? What kind of support should be provided in this case,
>>> especially, in case if API continued evolving after LTS has been 
>>> issued? Do
>>> we need to provide a guarantee (another annotation, for example) that 
>>> API
>>> won’t be changed between two LTS releases?
>>>
>>> And one more related question, which probably deserves another
>>> discussion (or was already discussed) - what is criteria to remove
>>> status “@Experimental” from API? How we decide that API is stable and 
>>> not
>>> changing anymore?
>>>
>>>
>>> On 4 Oct 2018, at 12:35, Robert Bradshaw 
>>> wrote:
>>>
>>> +1 to cutting the release.
>>>
>>> I agree that the LTS label requires more discussion. I think it
>>> boils down to the question of whether we are comfortable with 
>>> encouraging
>>> people to not upgrade to the latest Beam. It probably boils down to
>>> creating a list of (potential) blockers and then going from there. 
>>> Also, on
>>> this note, I think we should be very conservative in updating 
>>> dependencies
>>> for an LTS release.
>>>
>>> We could also consider for this release doing an "LTS light" where
>>> we prove the process, gain some experience, but don't 

Re: Does anyone have a strong intelliJ setup?

2018-10-10 Thread Scott Wegner
Last week I migrated all previous content from the website into wiki pages
for IntelliJ [1] and Eclipse [2] (thanks Thomas Weise for the pointers).

The next step is to incorporate all the tips that people have mentioned
here and fill in any other gaps we have. Here's how I'd like to get started:

1) Focus on IntelliJ first. I don't use Eclipse and I don't have the
expertise to make this experience great. I'd be glad if somebody else
picked this up.
2) Re-organize the wiki page into a set of high-level developer tasks that
we support; things like "Setting up IntelliJ IDE from scratch", "Performing
a full build", "Building a testing a single module", "Running a single unit
test", "Running an IT for a particular runner", "Recovering from project
corruption", "Common errors"
3) Work on one section at a time, filling in step-by-step instructions that
are prescriptive and easy to validate.

And I'd love some help! Here's what you could do to help:

* Respond to this email with any high-level "developer scenarios" that I've
forgotten above. Things that you should be able to do in an IDE and we
should document for all contributors.
* Add your tips and work-arounds; I'll be collecting as much as I can in
this working doc before organizing it into the wiki:
https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit#
* Write wiki documentation for one of the scenarios listed above. Let us
know which you'll be working on so we don't duplicate work.

[1] https://cwiki.apache.org/confluence/display/BEAM/IntelliJ+Tips
[2] https://cwiki.apache.org/confluence/display/BEAM/Eclipse+Tips

On Thu, Oct 4, 2018 at 7:43 AM Maximilian Michels  wrote:

> Yes, you need to manually add the vendor JAR to the modules where it is
> missing. AFAIK there is no automatic solution.
>
> On 04.10.18 16:34, Thomas Weise wrote:
> > Was anyone successful making Intellij understand the dependency
> > vendoring and not display as unresolvable symbols?
> >
> >
> > On Thu, Oct 4, 2018 at 6:13 AM Maximilian Michels  > > wrote:
> >
> > That's fine, I think we have accepted the fact that IntelliJ only
> works
> > with delegating the build to Gradle instead of using its built-in
> > Gradle
> > support. That comes with a bunch of drawbacks, i.e. slow build/test
> > execution.
> >
> >  > 4. the current gradle setup still requires some knowledge about
> > the setup (like for validates runners which are not "just tests")
> > and there is no trivial way to make the IDE aware of it until you
> > generate the IDE files (.idea
> > The ValidatesRunner tests are not part of the IntelliJ setup. These
> are
> > additional integration test which are part of Gradle but can't be
> > programmatically called from within IntelliJ.
> >
> > On 04.10.18 14:59, Romain Manni-Bucau wrote:
> >  >
> >  >
> >  >
> >  > Le jeu. 4 oct. 2018 à 14:53, Maximilian Michels  > 
> >  > >> a écrit :
> >  >
> >  >  > We have some hints in the gradle files that used to allow a
> >  > smooth import with no extra steps*. Have the hints gotten out
> of
> >  > date or are there new hints we can put in that might help?
> >  >
> >  > If you're referring to the `gradle idea` task which generates
> > IntelliJ
> >  > IPR files, that doesn't work anymore. The build is way too
> > involved for
> >  > that too work. We've since removed this from the contribute
> > guide.
> >  >
> >  > There is still the IntelliJ tips page which describes a
> different
> >  > (non-working) procedure. In the end, you have to fiddle with
> the
> >  > project
> >  > setup, i.e. adding the vendor JAR to the classpath where
> > necessary. But
> >  > it breaks as soon as your refresh the Gradle project.
> >  >
> >  > Romain, can you really get it to work out of the box with
> > your method?
> >  > If so, I'd like to contact you for information to update the
> >  > IntelliJ page.
> >  >
> >  >
> >  > Yep, worked at least last time I tried. I didn't played much with
> > it but
> >  > I assume it is reproducible. Feel free to ping me on slack.
> >  >
> >  >
> >  > Note, this is not the first conversation, so we should really
> > fix the
> >  > instructions/describe the workarounds. See also
> >  >
> >
> https://lists.apache.org/thread.html/c8323622e5de92089ebdfecee09a0e37cae0c631e1bebf06ed9f2bc6@%3Cdev.beam.apache.org%3E
> >  >
> >  >
> >  > The small warn here is that, by design, you will not fix them all
> > since:
> >  >
> >  > 1. the IDE must run the script to import the project (which is a
> big
> >  > drawback compared to maven where it can be imported without
> > running any
> >  > project 

Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-10 Thread Jean-Baptiste Onofré
Hey

Etienne should do a new pass soon. I do my best to cherry pick RabbitMQIO.

Thanks
Regards
JB

Le 10 oct. 2018 à 21:25, à 21:25, Ahmet Altay  a écrit:
>Update:
>
>I started cutting the branch. There are 2 open issues:
>- RabbitMQIO - JB, if you plan to complete this soon I can cherry pick
>to
>the branch.
>- One new issue related to release process changes with respect to
>beam-site deprecation.
>
>On Tue, Oct 9, 2018 at 11:38 AM, Jean-Baptiste Onofré 
>wrote:
>
>> Ok. Gonna move forward on RabbitMQIO asap.
>>
>> Thanks
>> Regards
>> JB
>> Le 9 oct. 2018, à 21:00, Ahmet Altay  a écrit:
>>>
>>> Hi all,
>>>
>>> Reminder, I will cut the release branch tomorrow. If you have not
>done so
>>> please take a look at the 2.8.0 issues assigned to you [1].
>>>
>>> Thank you!
>>> Ahmet
>>>
>>> [1] https://issues.apache.org/jira/issues/?jql=project%20%
>>> 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
>>> 20fixVersion%20%3D%202.8.0%20ORDER%20BY%20priority%
>>> 20DESC%2C%20updated%20DESC
>>>
>>> On Thu, Oct 4, 2018 at 9:27 AM, Ahmet Altay 
>wrote:
>>>
 Thank you all for the feedback. I will continue with 2.8.0 as a
>regular
 release and separate the LTS discussion to a new thread.

 On Thu, Oct 4, 2018 at 7:58 AM, Thomas Weise 
>wrote:

> Given the feedback so far, we should probably decouple LTS and
>2.8.0
> discussions. In case both converge before 10/10 then fine, but not
> necessary. I also agree that we should not jump the gun on LTS and
>minimum
> 72 hours feedback window for the topic looks appropriate.
>
> The issues raised by Tim look like blockers and unless we are
>confident
> that they can be addressed as a patch release may warrant to defer
>LTS? Can
> we start to tag such JIRAs with an LTS label?
>
> On the other hand, I think we could allow for a bit of
>experimentation
> error for the first LTS attempt and feed guidelines/policies from
> learnings/feedback.
>
> Dependency updates for LTS: I don't think we should block LTS
>because
> there is a newer version of a dependency out there or we should
>rush
> updates. If we prioritize stability, then the latest usually isn't
>the
> best. In the case of Flink, 1.5.x is probably what most users have
>at this
> time and it has seen 4 patch releases. If Flink community
>continues to
> support last two minor (X.Y) versions, then 1.5.x support may drop
>when 1.7
> comes out, but that does not mean we cannot use it if we were to
>cut a Beam
> LTS release today. I generally think that LTS needs to focus more
>on the
> stability of Beam itself.
>
> Thanks,
> Thomas
>
>
>
> On Thu, Oct 4, 2018 at 6:59 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Regarding LTS release - I agree that we need to have clear view
>what
>> kind of support will be provided for such releases.
>>
>> Despite of the concerns mentioned before, I have another one
>about API
>> labeled as “@Experimental". I think there are most of IOs, SQL,
>PCollection
>> with Schema, etc, labeled with this annotation.
>> According to definition, such API should be considered as
>unstable in
>> terms that it can be changed/removed in next releases.
>>
>> So, the question is - how “@Experimental” API affects LTS
>releases (if
>> it does)? What kind of support should be provided in this case,
>especially,
>> in case if API continued evolving after LTS has been issued? Do
>we need to
>> provide a guarantee (another annotation, for example) that API
>won’t be
>> changed between two LTS releases?
>>
>> And one more related question, which probably deserves another
>> discussion (or was already discussed) - what is criteria to
>remove
>> status “@Experimental” from API? How we decide that API is stable
>and not
>> changing anymore?
>>
>>
>> On 4 Oct 2018, at 12:35, Robert Bradshaw 
>wrote:
>>
>> +1 to cutting the release.
>>
>> I agree that the LTS label requires more discussion. I think it
>boils
>> down to the question of whether we are comfortable with
>encouraging people
>> to not upgrade to the latest Beam. It probably boils down to
>creating a
>> list of (potential) blockers and then going from there. Also, on
>this note,
>> I think we should be very conservative in updating dependencies
>for an LTS
>> release.
>>
>> We could also consider for this release doing an "LTS light"
>where we
>> prove the process, gain some experience, but don't promise a full
>12 months
>> of support (say, cutting it to 6 months).
>>
>> - Robert
>>
>>
>>
>>
>> On Thu, Oct 4, 2018 at 11:25 AM Tim Robertson <
>> timrobertson...@gmail.com> wrote:
>>
>>> I was in the middle of writing something similar when Ismaël
>posted.
>>>
>>> Please do bear in mind that this is an international 

Re: Splitting the repo

2018-10-10 Thread Romain Manni-Bucau
Le mer. 10 oct. 2018 21:31, Robert Bradshaw  a écrit :

>
>
> On Wed, Oct 10, 2018, 4:56 PM Romain Manni-Bucau 
> wrote:
>
>>
>>
>>
>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels  a
>> écrit :
>>
>>> Hi,
>>>
>>> I agree that splitting up Beam into separate repositories would cause
>>> more pain than gain.
>>>
>>> To a large degree we already have independent modules, e.g. runners/* or
>>> sdks/*. Although this is not the case for the core. It would be
>>> desirable to break it up further.
>>>
>>
>> Think this part is ok for everyone.
>>
>>
>>>
>>>  > possibly even with their own build system (unified only through a
>>>  > top-level "build everything" script that descends into each subdir and
>>>  > runs the appropriate command).
>>>
>>> This is almost what we have. Yes, there are some dependencies on the
>>> Beam Gradle Plugin, but even if we had completely independent build
>>> directories, you'd still want to have a shared config/tasks across the
>>> projects (which might bring you back to a setup similar to what we have).
>>>
>>> One of the pain points seems to be the portability which "polluted" some
>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>> that could have been solved with an abstraction. But the lack of
>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>
>> Not at all. Assume we have a full build which is doing portability then 3
>> concurrent builds (go, python, java)
>> then we have "current step" in the CI but the dev are never affected by
>> that and the build does not mess up their machines as well.
>>
>
> I agree that no matter what, builds should not be messing up people's
> machines. (I hope they're not; if they are we should jump on fixing that
> right away.)
>

Go still create symb links on themselve which is broken in some env. Never
checked why and forcing a clean is a workaround.



>
>
> Today the main blocker is that default "profile" (script) is not matching
>> dev persona and therefore there is no real hope to have external
>> contributions
>> outside google related guys as mentionned by previous ficgures which is
>> sad for a project promishing unification and work between communities IMHO.
>>
>
> Trying to span different communities, especially those as diverse as those
> from thee Java, Python, and Go (and hopefully others) ecosystems, is
> nontrivial; one must span different expectations, workflows, "dev
> personas," etc. This may require some comprise from all but I am hopeful it
> will be minimal (e.g.there's some files in my repo and artifacts I had to
> build once when I built the world but it just worked and I don't look at
> them...) But it's clear from the other thread that we need to fix the Java
> IDE experience, and possibly other things too, because it's not working out
> for everyone as well as it could.
>

So short term we go to "profiles" skipping modules?


Then we just need solebody who can tackle idea integration (1. Import 2.
Test without gradle runner) issues soon. These are the most urgent blockers
and if fixed the language things can be more minor perhaps.



>
>
>>
>>>
>>> -Max
>>>
>>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>>> > Yep for the split
>>> >
>>> > For the clean point it is quite linked to the build tools and fake env
>>> > for not native modules for the build tool (go for gradle which is java
>>> > first for instance). This is why having a real build which is natural
>>> > per language would be beneficial IMO.
>>> >
>>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré >> > > a écrit :
>>> >
>>> > Correct, it's more "module splitting" than repositories indeed.
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 10/10/2018 10:35, Robert Bradshaw wrote:
>>> >  > Gotcha. So this is more about dividing the code (particularly
>>> > core) into
>>> >  > finer modules, rather than splitting the modules into separate
>>> >  > repositories, right?
>>> >  >
>>> >  > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>>> > mailto:j...@nanthrax.net>
>>> >  > >> wrote:
>>> >  >
>>> >  > The purpose is that we have a monolithic core today mostly
>>> > providing
>>> >  > abstract classes.
>>> >  >
>>> >  > The idea is to have something more API oriented with
>>> > interface/SPI.
>>> >  >
>>> >  > Our users would then be able to pick the part of the core
>>> > they want,
>>> >  > resulting with lighter artifacts, and for us, it gives a
>>> more
>>> > flexible
>>> >  > approach.
>>> >  >
>>> >  > Regards
>>> >  > JB
>>> >  >
>>> >  > On 10/10/2018 10:26, Robert Bradshaw wrote:
>>> >  > > My question was not whether we should split the repo, but
>>> why?
>>> >  > (Dividing
>>> >  > > things into more (or fewer) modules withing a single 

Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
On Wed, Oct 10, 2018, 4:56 PM Romain Manni-Bucau 
wrote:

>
>
>
> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels  a
> écrit :
>
>> Hi,
>>
>> I agree that splitting up Beam into separate repositories would cause
>> more pain than gain.
>>
>> To a large degree we already have independent modules, e.g. runners/* or
>> sdks/*. Although this is not the case for the core. It would be
>> desirable to break it up further.
>>
>
> Think this part is ok for everyone.
>
>
>>
>>  > possibly even with their own build system (unified only through a
>>  > top-level "build everything" script that descends into each subdir and
>>  > runs the appropriate command).
>>
>> This is almost what we have. Yes, there are some dependencies on the
>> Beam Gradle Plugin, but even if we had completely independent build
>> directories, you'd still want to have a shared config/tasks across the
>> projects (which might bring you back to a setup similar to what we have).
>>
>> One of the pain points seems to be the portability which "polluted" some
>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>> that could have been solved with an abstraction. But the lack of
>> abstraction also forced us to adopt the portable pipeline code quicker.
>>
>
> Not at all. Assume we have a full build which is doing portability then 3
> concurrent builds (go, python, java)
> then we have "current step" in the CI but the dev are never affected by
> that and the build does not mess up their machines as well.
>

I agree that no matter what, builds should not be messing up people's
machines. (I hope they're not; if they are we should jump on fixing that
right away.)


Today the main blocker is that default "profile" (script) is not matching
> dev persona and therefore there is no real hope to have external
> contributions
> outside google related guys as mentionned by previous ficgures which is
> sad for a project promishing unification and work between communities IMHO.
>

Trying to span different communities, especially those as diverse as those
from thee Java, Python, and Go (and hopefully others) ecosystems, is
nontrivial; one must span different expectations, workflows, "dev
personas," etc. This may require some comprise from all but I am hopeful it
will be minimal (e.g.there's some files in my repo and artifacts I had to
build once when I built the world but it just worked and I don't look at
them...) But it's clear from the other thread that we need to fix the Java
IDE experience, and possibly other things too, because it's not working out
for everyone as well as it could.



>
>>
>> -Max
>>
>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>> > Yep for the split
>> >
>> > For the clean point it is quite linked to the build tools and fake env
>> > for not native modules for the build tool (go for gradle which is java
>> > first for instance). This is why having a real build which is natural
>> > per language would be beneficial IMO.
>> >
>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré > > > a écrit :
>> >
>> > Correct, it's more "module splitting" than repositories indeed.
>> >
>> > Regards
>> > JB
>> >
>> > On 10/10/2018 10:35, Robert Bradshaw wrote:
>> >  > Gotcha. So this is more about dividing the code (particularly
>> > core) into
>> >  > finer modules, rather than splitting the modules into separate
>> >  > repositories, right?
>> >  >
>> >  > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>> > mailto:j...@nanthrax.net>
>> >  > >> wrote:
>> >  >
>> >  > The purpose is that we have a monolithic core today mostly
>> > providing
>> >  > abstract classes.
>> >  >
>> >  > The idea is to have something more API oriented with
>> > interface/SPI.
>> >  >
>> >  > Our users would then be able to pick the part of the core
>> > they want,
>> >  > resulting with lighter artifacts, and for us, it gives a more
>> > flexible
>> >  > approach.
>> >  >
>> >  > Regards
>> >  > JB
>> >  >
>> >  > On 10/10/2018 10:26, Robert Bradshaw wrote:
>> >  > > My question was not whether we should split the repo, but
>> why?
>> >  > (Dividing
>> >  > > things into more (or fewer) modules withing a single repo
>> is a
>> >  > separate
>> >  > > question.) Maybe I'm just not following what you mean by
>> > "more API
>> >  > > oriented." It would force stabler APIs.
>> >  > >
>> >  > > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>> >  > mailto:j...@nanthrax.net>
>> > >
>> >  > > 
>> > > >  > >
>> >  > > Hi,
>> >  > >
>> >  

Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-10 Thread Ahmet Altay
Update:

I started cutting the branch. There are 2 open issues:
- RabbitMQIO - JB, if you plan to complete this soon I can cherry pick to
the branch.
- One new issue related to release process changes with respect to
beam-site deprecation.

On Tue, Oct 9, 2018 at 11:38 AM, Jean-Baptiste Onofré 
wrote:

> Ok. Gonna move forward on RabbitMQIO asap.
>
> Thanks
> Regards
> JB
> Le 9 oct. 2018, à 21:00, Ahmet Altay  a écrit:
>>
>> Hi all,
>>
>> Reminder, I will cut the release branch tomorrow. If you have not done so
>> please take a look at the 2.8.0 issues assigned to you [1].
>>
>> Thank you!
>> Ahmet
>>
>> [1] https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
>> 20fixVersion%20%3D%202.8.0%20ORDER%20BY%20priority%
>> 20DESC%2C%20updated%20DESC
>>
>> On Thu, Oct 4, 2018 at 9:27 AM, Ahmet Altay  wrote:
>>
>>> Thank you all for the feedback. I will continue with 2.8.0 as a regular
>>> release and separate the LTS discussion to a new thread.
>>>
>>> On Thu, Oct 4, 2018 at 7:58 AM, Thomas Weise  wrote:
>>>
 Given the feedback so far, we should probably decouple LTS and 2.8.0
 discussions. In case both converge before 10/10 then fine, but not
 necessary. I also agree that we should not jump the gun on LTS and minimum
 72 hours feedback window for the topic looks appropriate.

 The issues raised by Tim look like blockers and unless we are confident
 that they can be addressed as a patch release may warrant to defer LTS? Can
 we start to tag such JIRAs with an LTS label?

 On the other hand, I think we could allow for a bit of experimentation
 error for the first LTS attempt and feed guidelines/policies from
 learnings/feedback.

 Dependency updates for LTS: I don't think we should block LTS because
 there is a newer version of a dependency out there or we should rush
 updates. If we prioritize stability, then the latest usually isn't the
 best. In the case of Flink, 1.5.x is probably what most users have at this
 time and it has seen 4 patch releases. If Flink community continues to
 support last two minor (X.Y) versions, then 1.5.x support may drop when 1.7
 comes out, but that does not mean we cannot use it if we were to cut a Beam
 LTS release today. I generally think that LTS needs to focus more on the
 stability of Beam itself.

 Thanks,
 Thomas



 On Thu, Oct 4, 2018 at 6:59 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Regarding LTS release - I agree that we need to have clear view what
> kind of support will be provided for such releases.
>
> Despite of the concerns mentioned before, I have another one about API
> labeled as “@Experimental". I think there are most of IOs, SQL, 
> PCollection
> with Schema, etc, labeled with this annotation.
> According to definition, such API should be considered as unstable in
> terms that it can be changed/removed in next releases.
>
> So, the question is - how “@Experimental” API affects LTS releases (if
> it does)? What kind of support should be provided in this case, 
> especially,
> in case if API continued evolving after LTS has been issued? Do we need to
> provide a guarantee (another annotation, for example) that API won’t be
> changed between two LTS releases?
>
> And one more related question, which probably deserves another
> discussion (or was already discussed) - what is criteria to remove
> status “@Experimental” from API? How we decide that API is stable and not
> changing anymore?
>
>
> On 4 Oct 2018, at 12:35, Robert Bradshaw  wrote:
>
> +1 to cutting the release.
>
> I agree that the LTS label requires more discussion. I think it boils
> down to the question of whether we are comfortable with encouraging people
> to not upgrade to the latest Beam. It probably boils down to creating a
> list of (potential) blockers and then going from there. Also, on this 
> note,
> I think we should be very conservative in updating dependencies for an LTS
> release.
>
> We could also consider for this release doing an "LTS light" where we
> prove the process, gain some experience, but don't promise a full 12 
> months
> of support (say, cutting it to 6 months).
>
> - Robert
>
>
>
>
> On Thu, Oct 4, 2018 at 11:25 AM Tim Robertson <
> timrobertson...@gmail.com> wrote:
>
>> I was in the middle of writing something similar when Ismaël posted.
>>
>> Please do bear in mind that this is an international project and 7hrs
>> is not long enough to decide upon something that affects us all.
>>
>> +1 on cutting 2.8.0 on 10/10 and thank you for pushing it forward
>>
>> -1 on designating it as LTS:
>> While LTS is a statement of expectation in maintenance it also

Re: Beam Samza Runner status update

2018-10-10 Thread Rui Wang
Thanks for sharing! it's so exciting to hear that Beam is being used on
Samza in production @LinkedIn! Your feedback will be helpful to Beam
community!

Besides, Beam supports SQL right now and hopefully Beam community could
also receive feedback on BeamSQL
 in the future.

-Rui

On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
wrote:

> Thanks for sharing and congrats for this great work !
>
> Regards
> JB
> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
> target=_blank>xinyuliu...@gmail.com> a écrit:
>>
>> Hi, All,
>>
>> It's been over four months since we added the Samza Runner to Beam, and
>> we've been making a lot of progress after that. Here I would like to update
>> your guys and share some really good news happening here at LinkedIn:
>>
>> 1) First Beam job in production @LInkedIn!
>> After a few rounds of testing and benchmarking, we finally rolled out our
>> first Beam job here! The job uses quite a few features, such as event time,
>> fixed/session windowing, early triggering, and stateful processing. Our
>> first customer is very happy and they highly appraise the easy-to-use Beam
>> API as well as powerful processing model. Due to the limited resources
>> here, we put our full trust in the work you guys are doing, and we didn't
>> run into any surprises. We see extremely attention to details as well as
>> non-compromise in any user experience everywhere in the code base. We would
>> like to thank everyone in the Beam community to contribute to such an
>> amazing framework!
>>
>> 2) A portable Samza Runner prototype
>> We are also starting the work in making Samza Runner portable. So far we
>> just got the python word count example working using portable Samza Runner.
>> Please look out for the PR for this very soon :). Again, this work is not
>> possible without the great Beam portability framework, and the developers
>> like Luke and Ahmet, just to name a few, behind it. The ReferenceRunner has
>> been extremely useful to us to figure out what's needed and how it works.
>> Kudos to Thomas Groh, Ben Sidhom and all the others who makes this
>> available to us. And to Kenn, your fuse work rocks.
>>
>> 3) More contributors in Samza Runner
>> The runner has been Chris and my personal project for a while and now
>> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
>> contribute. Hai has been focusing on the portability work as mentioned in
>> #2, and Boris will work mostly on supporting our use cases. We will send
>> more emails discussing our use cases, like the "Update state after firing"
>> email I sent out earlier.
>>
>> Finally, a shout-out to our very own Chris Pettitt. Without you, none of
>> the above won't happen!
>>
>> Thanks,
>> Xinyu
>>
>


Re: Splitting the repo

2018-10-10 Thread Ankur Goenka
Hi,

I think the subtext here is that development is hard in general. I agree to
it. And a major cause of it is diversity of languages, complexity of the
project and legacy code.
To alleviate language related issues, we are trying to have modular code
which we already have to a certain extent.
On the other hand tooling is still evolving and needs improvement. I also
feel that tooling is a moving target and its good to keep on reevaluating
it.
Tooling is a problem for everyone (the whole community) and we are actively
trying to solve it. Gradle is a big step towards it.
I personally contribute to multiple languages. Many of the PR have changes
spanning across languages and have to be merged as a whole. I personally
feel that having a unified build system makes it easier to do the checks
and make sure things work.
Even after gradle, I am still able to setup intellij for Java, Pycharm for
Python and GoLand for Go as I would have done earlier (before gradle). I am
also able to run "python setup.py sdist" as I was able to do before gradle.
Gradle is also acting as the top level task manager and most of the python
tasks are just plain shell commands stitched together.
The only real problem that I face in my setup is the vendored java jars
which only impact java development.
Probably documenting separate environment specific setup for each language
is sufficient to address the issue.

I also agree with Max that splitting the repo will cause more pain than
gain.

~Ankur



On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau 
wrote:

>
>
>
> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels  a
> écrit :
>
>> Hi,
>>
>> I agree that splitting up Beam into separate repositories would cause
>> more pain than gain.
>>
>> To a large degree we already have independent modules, e.g. runners/* or
>> sdks/*. Although this is not the case for the core. It would be
>> desirable to break it up further.
>>
>
> Think this part is ok for everyone.
>
>
>>
>>  > possibly even with their own build system (unified only through a
>>  > top-level "build everything" script that descends into each subdir and
>>  > runs the appropriate command).
>>
>> This is almost what we have. Yes, there are some dependencies on the
>> Beam Gradle Plugin, but even if we had completely independent build
>> directories, you'd still want to have a shared config/tasks across the
>> projects (which might bring you back to a setup similar to what we have).
>>
>> One of the pain points seems to be the portability which "polluted" some
>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>> that could have been solved with an abstraction. But the lack of
>> abstraction also forced us to adopt the portable pipeline code quicker.
>>
>
> Not at all. Assume we have a full build which is doing portability then 3
> concurrent builds (go, python, java)
> then we have "current step" in the CI but the dev are never affected by
> that and the build does not mess up their machines as well.
>
> Today the main blocker is that default "profile" (script) is not matching
> dev persona and therefore there is no real hope to have external
> contributions
> outside google related guys as mentionned by previous ficgures which is
> sad for a project promishing unification and work between communities IMHO.
>
>
>>
>> -Max
>>
>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>> > Yep for the split
>> >
>> > For the clean point it is quite linked to the build tools and fake env
>> > for not native modules for the build tool (go for gradle which is java
>> > first for instance). This is why having a real build which is natural
>> > per language would be beneficial IMO.
>> >
>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré > > > a écrit :
>> >
>> > Correct, it's more "module splitting" than repositories indeed.
>> >
>> > Regards
>> > JB
>> >
>> > On 10/10/2018 10:35, Robert Bradshaw wrote:
>> >  > Gotcha. So this is more about dividing the code (particularly
>> > core) into
>> >  > finer modules, rather than splitting the modules into separate
>> >  > repositories, right?
>> >  >
>> >  > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>> > mailto:j...@nanthrax.net>
>> >  > >> wrote:
>> >  >
>> >  > The purpose is that we have a monolithic core today mostly
>> > providing
>> >  > abstract classes.
>> >  >
>> >  > The idea is to have something more API oriented with
>> > interface/SPI.
>> >  >
>> >  > Our users would then be able to pick the part of the core
>> > they want,
>> >  > resulting with lighter artifacts, and for us, it gives a more
>> > flexible
>> >  > approach.
>> >  >
>> >  > Regards
>> >  > JB
>> >  >
>> >  > On 10/10/2018 10:26, Robert Bradshaw wrote:
>> >  > > My question was not whether we should split 

Re: Beam Samza Runner status update

2018-10-10 Thread Jean-Baptiste Onofré
Thanks for sharing and congrats for this great work !

Regards
JB

Le 10 oct. 2018 à 20:23, à 20:23, Xinyu Liu  a écrit:
>Hi, All,
>
>It's been over four months since we added the Samza Runner to Beam, and
>we've been making a lot of progress after that. Here I would like to
>update
>your guys and share some really good news happening here at LinkedIn:
>
>1) First Beam job in production @LInkedIn!
>After a few rounds of testing and benchmarking, we finally rolled out
>our
>first Beam job here! The job uses quite a few features, such as event
>time,
>fixed/session windowing, early triggering, and stateful processing. Our
>first customer is very happy and they highly appraise the easy-to-use
>Beam
>API as well as powerful processing model. Due to the limited resources
>here, we put our full trust in the work you guys are doing, and we
>didn't
>run into any surprises. We see extremely attention to details as well
>as
>non-compromise in any user experience everywhere in the code base. We
>would
>like to thank everyone in the Beam community to contribute to such an
>amazing framework!
>
>2) A portable Samza Runner prototype
>We are also starting the work in making Samza Runner portable. So far
>we
>just got the python word count example working using portable Samza
>Runner.
>Please look out for the PR for this very soon :). Again, this work is
>not
>possible without the great Beam portability framework, and the
>developers
>like Luke and Ahmet, just to name a few, behind it. The ReferenceRunner
>has
>been extremely useful to us to figure out what's needed and how it
>works.
>Kudos to Thomas Groh, Ben Sidhom and all the others who makes this
>available to us. And to Kenn, your fuse work rocks.
>
>3) More contributors in Samza Runner
>The runner has been Chris and my personal project for a while and now
>it's
>not the case. We got Hai Lu and Boris Shkolnik from Samza team to
>contribute. Hai has been focusing on the portability work as mentioned
>in
>#2, and Boris will work mostly on supporting our use cases. We will
>send
>more emails discussing our use cases, like the "Update state after
>firing"
>email I sent out earlier.
>
>Finally, a shout-out to our very own Chris Pettitt. Without you, none
>of
>the above won't happen!
>
>Thanks,
>Xinyu


Beam Samza Runner status update

2018-10-10 Thread Xinyu Liu
Hi, All,

It's been over four months since we added the Samza Runner to Beam, and
we've been making a lot of progress after that. Here I would like to update
your guys and share some really good news happening here at LinkedIn:

1) First Beam job in production @LInkedIn!
After a few rounds of testing and benchmarking, we finally rolled out our
first Beam job here! The job uses quite a few features, such as event time,
fixed/session windowing, early triggering, and stateful processing. Our
first customer is very happy and they highly appraise the easy-to-use Beam
API as well as powerful processing model. Due to the limited resources
here, we put our full trust in the work you guys are doing, and we didn't
run into any surprises. We see extremely attention to details as well as
non-compromise in any user experience everywhere in the code base. We would
like to thank everyone in the Beam community to contribute to such an
amazing framework!

2) A portable Samza Runner prototype
We are also starting the work in making Samza Runner portable. So far we
just got the python word count example working using portable Samza Runner.
Please look out for the PR for this very soon :). Again, this work is not
possible without the great Beam portability framework, and the developers
like Luke and Ahmet, just to name a few, behind it. The ReferenceRunner has
been extremely useful to us to figure out what's needed and how it works.
Kudos to Thomas Groh, Ben Sidhom and all the others who makes this
available to us. And to Kenn, your fuse work rocks.

3) More contributors in Samza Runner
The runner has been Chris and my personal project for a while and now it's
not the case. We got Hai Lu and Boris Shkolnik from Samza team to
contribute. Hai has been focusing on the portability work as mentioned in
#2, and Boris will work mostly on supporting our use cases. We will send
more emails discussing our use cases, like the "Update state after firing"
email I sent out earlier.

Finally, a shout-out to our very own Chris Pettitt. Without you, none of
the above won't happen!

Thanks,
Xinyu


Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Scott Wegner
> Perhaps we should go through and prioritize (and add missing items to)
BEAM-4045

+1. It's hard to know where to start when there's such a laundry list of
tasks. If you're having build issues, will you make sure it is represented
in BEAM-4045, and "Vote" for the issues that you believe are the highest
priority?

I agree that the Gradle build is far from perfect (my top gripes are IDE
integration and parallel/incremental build support). I believe that we're
capable of making our build great, and continuing our investment in Gradle
would be a shorter path than changing course again. Remember that our Maven
build also had it's share of issues, which is why we as a community voted
to replace it [1][2].

It seems we also need to help build Gradle expertise in our community, so
that those that are motivated are empowered to contribute. Does anybody
have a good "Getting Started with Gradle" guide they recommend? Perhaps we
could also link to it from the website/wiki.

[1]
https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E
[2]
https://lists.apache.org/thread.html/bd399ecb17cd211be7c6089b562c09ba9116649c9eabe3b609606a3b@%3Cdev.beam.apache.org%3E

On Wed, Oct 10, 2018 at 2:40 AM Robert Bradshaw  wrote:

> Some rough stats (because I was curious): The gradle files have been
> edited by ~79 unique contributors over 696 distinct commits, whereas the
> maven ones were edited (over a longer time period) by ~130 unique
> contributors over 1389 commits [1]. This doesn't capture how much effort
> was put into these edits, but neither is restricted to a small set of
> experts.
>
> Regarding "friendly for other languages" I don't think either is
> necessarily easy to learn, but my impression is that the maven learning
> curve shallower for those already firmly embedded in the Java ecosystem
> (perhaps due to leveraging existing familiarity, and perhaps some due to
> the implicit java-centric conventions that maven assumed about your
> project), whereas with gradle at least I could keep pulling on the string
> to unwind things to the bottom. The "I just want to build/test X without
> editing/viewing the build files" seemed more natural with Gradle (e.g. I
> can easily list all tasks).
>
> That being said, I don't think everyone needs to understand the full build
> system. It's important that there be a critical mass that do (we have that
> for both, and if we can simplify to improve this that'd be great), it's
> easy enough to do basic changes (e.g. add a dependency, again I don't think
> the barrier is sufficiently different for either), and works well out of
> the box for someone who just wants to look up a command on the website and
> edit code (the CLI is an improvement with Gradle, but it's clear that
> (java) IDE support is a significant regression).
>
> Personally, I don't know much about IDE configuration (admittedly the
> larger issue), but one action item I can take on is trying to eliminate the
> need to do a "git clean" after building certain targets (assuming I can
> reproduce this).
>
> Perhaps we should go through and prioritize (and add missing items to)
> BEAM-4045
> https://issues.apache.org/jira/issues/?jql=parent%20%3D%20BEAM-4045%20ORDER%20BY%20priority%20DESC
> ? There's always a long tail with this kind of thing, and looking at the
> whole list can be daunting, but putting it in the correct order and
> knocking off the top N items could possibly go a long way.
>
> - Robert
>
> [1] The commands I ran were (with and without the uniq)
>
> $ find . -name 'build.gradle' | xargs git log | grep Author: | grep -o
> '[^< ]*@' | sort | uniq | wc
> $ find . -name 'pom.xml' | xargs git log | grep Author: | grep -o '[^<
> ]*@' | sort | uniq | wc
>
> On Wed, Oct 10, 2018 at 10:31 AM Etienne Chauchot 
> wrote:
>
>> Hi all,
>> I must admit that I agree on the status especially regarding 2 points:
>> 1. new contributors obstacles: gradle learning curve might be too long
>> for spare-time contributors, also complex scripted build takes time to
>> understand comparing to self-descriptive one.
>> 2. IDE integration kind of slows down development.
>>
>> Now, regarding how we improve the situation, I think we need to discuss
>> and identify tasks and tackle them all together even if they are not sexy
>> tasks as Ismaël mentioned.
>>
>> Etienne
>>
>> Le mardi 09 octobre 2018 à 10:04 +0200, Jean-Baptiste Onofré a écrit :
>>
>> Hi guys,
>>
>>
>> I know that's a hot topic, but I have to bring this discussion on the table.
>>
>>
>> Some months ago, we discussed about migrating our build from Maven to
>>
>> Gradle. One of the key expected improvement was the time to build.
>>
>> We proposed to do a PoC to evaluate the impacts and improvements, but
>>
>> this PoC was actually directly a migrate on master.
>>
>>
>> Now, I would like to bring facts here:
>>
>>
>> 1. Build time
>>
>> On my machine, the build time is roughly 1h15. It's pretty long, and
>>
>> 

Re: Log output from Dataflow tests

2018-10-10 Thread Maximilian Michels

Thank you Scott! Ismael also sent me the logs and I could fix the error.

It seems we have granted read-only access to project members in the 
past. I just checked back with Ankur, he might be able to grant access 
for my GCP account.


-Max

On 10.10.18 17:26, Scott Wegner wrote:
I'm not sure how apache-beam-testing permissions are managed; Kenn, 
could we grant read-access for contributors who need it for testing?


Here are two logs from the job that seem relevant:

2018-10-08 14:44:45.381 PDT
Parsing unknown args: 
[u'--dataflowJobId=2018-10-08_14_41_03-9578125971484804239', 
u'--autoscalingAlgorithm=NONE', u'--direct_runner_use_stacked_bundle', 
u'--maxNumWorkers=0', u'--style=scrambled', u'--sleep_secs=20', 
u'--pipeline_type_check', 
u'--gcpTempLocation=gs://temp-storage-for-end-to-end-tests/temp-it/beamapp-jenkins-1008214058-522436.1539034858.522554', 
u'--numWorkers=1', 
u'--beam_plugins=apache_beam.io.filesystem.FileSystem', 
u'--beam_plugins=apache_beam.io.hadoopfilesystem.HadoopFileSystem', 
u'--beam_plugins=apache_beam.io.localfilesystem.LocalFileSystem', 
u'--beam_plugins=apache_beam.io.gcp.gcsfilesystem.GCSFileSystem', 
u'--beam_plugins=apache_beam.io.filesystem_test.TestingFileSystem', 
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PipelineGraphRenderer', 
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.MuteRenderer', 
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.TextRenderer', 
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PydotRenderer', 
u'--pipelineUrl=gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1008214058-522436.1539034858.522554/pipeline.pb']


2018-10-08 14:44:45.382 PDT
Python sdk harness failed: Traceback (most recent call last): File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py", 
line 133, in main 
sdk_pipeline_options.get_all_options(drop_default=True)) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/options/pipeline_options.py", 
line 227, in get_all_options action='append' if num_times > 1 else 
'store') File "/usr/lib/python2.7/argparse.py", line 1308, in 
add_argument return self._add_action(action) File 
"/usr/lib/python2.7/argparse.py", line 1682, in _add_action 
self._optionals._add_action(action) File 
"/usr/lib/python2.7/argparse.py", line 1509, in _add_action action = 
super(_ArgumentGroup, self)._add_action(action) File 
"/usr/lib/python2.7/argparse.py", line 1322, in _add_action 
self._check_conflict(action) File "/usr/lib/python2.7/argparse.py", line 
1460, in _check_conflict conflict_handler(action, confl_optionals) File 
"/usr/lib/python2.7/argparse.py", line 1467, in _handle_conflict_error 
raise ArgumentError(action, message % conflict_string) ArgumentError: 
argument --beam_plugins: conflicting option string(s): --beam_plugins


On Wed, Oct 10, 2018 at 1:05 AM Maximilian Michels > wrote:


Would be great to provide access to Dataflow build logs.

In the meantime, could someone with access send me the logs for the job
below?


https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_14_41_03-9578125971484804239?project=apache-beam-testing

Thanks,
Max

On 09.10.18 13:45, Maximilian Michels wrote:
 > Hi,
 >
 > I'm debugging a test failure in Dataflow PostCommit. There are logs
 > available which I can't access. Is it possible to be added to the
 > apache-beam-testing project?
 >
 > Thanks,
 > Max
 >
 >
 > Example:
 >
==
 > FAIL: test_streaming_with_attributes
 > (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
 >
--
 > Traceback (most recent call last):
 >    File
 >

"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",

 > line 175, in test_streaming_with_attributes
 >      self._test_streaming(with_attributes=True)
 >    File
 >

"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",

 > line 167, in _test_streaming
 >      timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
 >    File
 >

"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",

 > line 91, in run_pipeline
 >      result = p.run()
 >    File
 >

"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py",

 > line 416, in run
 >      return self.runner.run_pipeline(self)
 >    File
 >


Re: Log output from Dataflow tests

2018-10-10 Thread Scott Wegner
I'm not sure how apache-beam-testing permissions are managed; Kenn, could
we grant read-access for contributors who need it for testing?

Here are two logs from the job that seem relevant:

2018-10-08 14:44:45.381 PDT
Parsing unknown args:
[u'--dataflowJobId=2018-10-08_14_41_03-9578125971484804239',
u'--autoscalingAlgorithm=NONE', u'--direct_runner_use_stacked_bundle',
u'--maxNumWorkers=0', u'--style=scrambled', u'--sleep_secs=20',
u'--pipeline_type_check',
u'--gcpTempLocation=gs://temp-storage-for-end-to-end-tests/temp-it/beamapp-jenkins-1008214058-522436.1539034858.522554',
u'--numWorkers=1', u'--beam_plugins=apache_beam.io.filesystem.FileSystem',
u'--beam_plugins=apache_beam.io.hadoopfilesystem.HadoopFileSystem',
u'--beam_plugins=apache_beam.io.localfilesystem.LocalFileSystem',
u'--beam_plugins=apache_beam.io.gcp.gcsfilesystem.GCSFileSystem',
u'--beam_plugins=apache_beam.io.filesystem_test.TestingFileSystem',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PipelineGraphRenderer',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.MuteRenderer',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.TextRenderer',
u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PydotRenderer',
u'--pipelineUrl=gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1008214058-522436.1539034858.522554/pipeline.pb']

2018-10-08 14:44:45.382 PDT
Python sdk harness failed: Traceback (most recent call last): File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
line 133, in main sdk_pipeline_options.get_all_options(drop_default=True))
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/options/pipeline_options.py",
line 227, in get_all_options action='append' if num_times > 1 else 'store')
File "/usr/lib/python2.7/argparse.py", line 1308, in add_argument return
self._add_action(action) File "/usr/lib/python2.7/argparse.py", line 1682,
in _add_action self._optionals._add_action(action) File
"/usr/lib/python2.7/argparse.py", line 1509, in _add_action action =
super(_ArgumentGroup, self)._add_action(action) File
"/usr/lib/python2.7/argparse.py", line 1322, in _add_action
self._check_conflict(action) File "/usr/lib/python2.7/argparse.py", line
1460, in _check_conflict conflict_handler(action, confl_optionals) File
"/usr/lib/python2.7/argparse.py", line 1467, in _handle_conflict_error
raise ArgumentError(action, message % conflict_string) ArgumentError:
argument --beam_plugins: conflicting option string(s): --beam_plugins

On Wed, Oct 10, 2018 at 1:05 AM Maximilian Michels  wrote:

> Would be great to provide access to Dataflow build logs.
>
> In the meantime, could someone with access send me the logs for the job
> below?
>
>
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_14_41_03-9578125971484804239?project=apache-beam-testing
>
> Thanks,
> Max
>
> On 09.10.18 13:45, Maximilian Michels wrote:
> > Hi,
> >
> > I'm debugging a test failure in Dataflow PostCommit. There are logs
> > available which I can't access. Is it possible to be added to the
> > apache-beam-testing project?
> >
> > Thanks,
> > Max
> >
> >
> > Example:
> > ==
> > FAIL: test_streaming_with_attributes
> > (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> > --
> > Traceback (most recent call last):
> >File
> >
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>
> > line 175, in test_streaming_with_attributes
> >  self._test_streaming(with_attributes=True)
> >File
> >
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>
> > line 167, in _test_streaming
> >  timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> >File
> >
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>
> > line 91, in run_pipeline
> >  result = p.run()
> >File
> >
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>
> > line 416, in run
> >  return self.runner.run_pipeline(self)
> >File
> >
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>
> > line 65, in run_pipeline
> >  hc_assert_that(self.result, pickler.loads(on_success_matcher))
> > AssertionError:
> > Expected: (Test pipeline expected terminated in state: RUNNING and
> > Expected 2 messages.)
> >   but: Expected 2 messages. Got 0 messages. Diffs (item, count):
> >Expected but not in actual: 

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-10 Thread Romain Manni-Bucau
some times Ago JB spoke about Beam roadmap. I tend to think this discussion
does no make any sense without a clear roadmap. The rational here is that a
roadmap will give you the future changes
and the potential future versions (we spoke a few times of Beam 3). This
does not have to be very factual, a slice of 3 months is ok at that stage.
However, if you don't have that,
you can say 2.8 will be LTS and we support 2 versions but if 2.9 and 2.10
introduce breaking changes, then
it leads to a LTS 2.8 no more supported. This is just an example but the
ratio "cost(project) / gain(user)" 100% depends the plans for the project,
technically there is no blocker to support all releases
for life but would any PMC have the will to release beam 0.x now? The point
of a LTS for an user is to plan investment, if we are in previous case it
does not help IMHO.
So maybe grab back the Beam enhancement plans and assign them some fix
versions before defining what support model of Beam can be.

Just the 2 cts of an outsider.

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le mer. 10 oct. 2018 à 17:10, Chamikara Jayalath  a
écrit :

>
>
> On Wed, Oct 10, 2018 at 2:56 AM Robert Bradshaw 
> wrote:
>
>> On Wed, Oct 10, 2018 at 9:37 AM Ismaël Mejía  wrote:
>>
>>> The simplest thing we can do is just to pin all the deps of the LTS
>>> and not move them in any maintenance release if not a strong reason to
>>> do so.
>>>
>>> The next subject is to make maintainers aware of which release will be
>>> the LTS in advance so they decide what to do with the dependencies
>>> versions. In my previous mail I mentioned all the possible cases that
>>> can happen with dependencies and it is clear that one unified policy
>>> won’t satisfy every one. So better let the maintainers (who can also
>>> ask for user feedback in the ML) to decide about  versions before the
>>> release.
>>>
>>> Alexey’s question is still a really important issue, and has been so
>>> far ignored. What happens with the ‘Experimental’ APIs in the LTS.
>>> Options are:
>>>
>>> (1) We keep consistent with Experimental which means that they are
>>> still not guarantees (note that this does not mean that they will be
>>> broken arbitrarily).
>>> (2) We are consistent with the LTS approach which makes them ‘non
>>> experimental’ for the LTS so we will guarantee the functionality/API
>>> stable.
>>>
>>> I personally have conflicted opinions I would like to favor (1) but
>>> this is not consistent with the whole idea of LTS so probably (2) is
>>> wiser.
>>>
>>
>> Yeah, I think (2) is the only viable option.
>>
>
> I think important thing here is that future releases on a  LTS branch will
> be patch (bugfix) releases so I don't think we can/should do
> API/functionality changes (even if the change is experimental and/or
> backwards compatible).
>
> I think same goes for dependency changes. If the change is to fix a known
> bug we can do that in a patch release but if it's to add more functionality
> probably that should come in a new minor release instead of a patch
> release.
>
> This is why I think we should be bit careful about "rushed" changes to
> major functionalities of Beam going into LTS releases. Just my 2 cents.
>
> Thanks,
> Cham
>
>
>>
>>
>>> Finally I also worry about Tim’s remarks on performance and quality,
>>> even if some of these things effectively can be fixed in a subsequent
>>> LTS release. Users will probably prefer a LTS to start with Beam and
>>> if the performance/quality of the LTS, this can hurt perception of the
>>> project.
>>>
>>
>> Yes, for this reason I think it's important to consider what goes into an
>> LTS as well as what happens after. Almost by definition, using an LTS is
>> choosing stability over cutting edge features. I'd rather major feature X
>> goes in after LTS, and lives in a couple of releases gaining fixes and
>> improvements before being released as part of the next LTS, than quickly
>> making it into an LTS while brand new (both due to the time period before
>> we refine it, and the extra work of porting refinements back).
>>
>> Or maybe LTS-users are unlikely to pick up a x.y.0 release anyway,
>> waiting for at least x.y.1?
>>
>> Come to think of it, do we even have to designate releases at LTS at the
>> time of release? Perhaps we could instead just keep with our normal release
>> cadence, and periodically choose one of them as LTS once we've confirmed
>> its stability out in the wild and start backporting to it. (I can think of
>> several cons with this approach as well, e.g. generally it's easier to
>> backport bugfixes at the time a bugfix is done in master rather than
>> swapping in context at a later date, but just thought I'd throw it out
>> 

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-10 Thread Chamikara Jayalath
On Wed, Oct 10, 2018 at 2:56 AM Robert Bradshaw  wrote:

> On Wed, Oct 10, 2018 at 9:37 AM Ismaël Mejía  wrote:
>
>> The simplest thing we can do is just to pin all the deps of the LTS
>> and not move them in any maintenance release if not a strong reason to
>> do so.
>>
>> The next subject is to make maintainers aware of which release will be
>> the LTS in advance so they decide what to do with the dependencies
>> versions. In my previous mail I mentioned all the possible cases that
>> can happen with dependencies and it is clear that one unified policy
>> won’t satisfy every one. So better let the maintainers (who can also
>> ask for user feedback in the ML) to decide about  versions before the
>> release.
>>
>> Alexey’s question is still a really important issue, and has been so
>> far ignored. What happens with the ‘Experimental’ APIs in the LTS.
>> Options are:
>>
>> (1) We keep consistent with Experimental which means that they are
>> still not guarantees (note that this does not mean that they will be
>> broken arbitrarily).
>> (2) We are consistent with the LTS approach which makes them ‘non
>> experimental’ for the LTS so we will guarantee the functionality/API
>> stable.
>>
>> I personally have conflicted opinions I would like to favor (1) but
>> this is not consistent with the whole idea of LTS so probably (2) is
>> wiser.
>>
>
> Yeah, I think (2) is the only viable option.
>

I think important thing here is that future releases on a  LTS branch will
be patch (bugfix) releases so I don't think we can/should do
API/functionality changes (even if the change is experimental and/or
backwards compatible).

I think same goes for dependency changes. If the change is to fix a known
bug we can do that in a patch release but if it's to add more functionality
probably that should come in a new minor release instead of a patch
release.

This is why I think we should be bit careful about "rushed" changes to
major functionalities of Beam going into LTS releases. Just my 2 cents.

Thanks,
Cham


>
>
>> Finally I also worry about Tim’s remarks on performance and quality,
>> even if some of these things effectively can be fixed in a subsequent
>> LTS release. Users will probably prefer a LTS to start with Beam and
>> if the performance/quality of the LTS, this can hurt perception of the
>> project.
>>
>
> Yes, for this reason I think it's important to consider what goes into an
> LTS as well as what happens after. Almost by definition, using an LTS is
> choosing stability over cutting edge features. I'd rather major feature X
> goes in after LTS, and lives in a couple of releases gaining fixes and
> improvements before being released as part of the next LTS, than quickly
> making it into an LTS while brand new (both due to the time period before
> we refine it, and the extra work of porting refinements back).
>
> Or maybe LTS-users are unlikely to pick up a x.y.0 release anyway, waiting
> for at least x.y.1?
>
> Come to think of it, do we even have to designate releases at LTS at the
> time of release? Perhaps we could instead just keep with our normal release
> cadence, and periodically choose one of them as LTS once we've confirmed
> its stability out in the wild and start backporting to it. (I can think of
> several cons with this approach as well, e.g. generally it's easier to
> backport bugfixes at the time a bugfix is done in master rather than
> swapping in context at a later date, but just thought I'd throw it out
> there.)
>
> On Wed, Oct 10, 2018 at 4:53 AM Kenneth Knowles  wrote:
>> >
>> > I've seen two mentions that "rushing" is contrary to the goals of LTS.
>> But I wouldn't worry about this. The fact is there is almost nothing you
>> can do to stabilize *prior* to cutting the LTS branch. Stability comes from
>> the branch being long-lived and having multiple releases.
>> >
>> > (I think this is pretty much my version of what JB is saying)
>> >
>> > What a conservative user will do if 2.8.x is declared LTS is to start
>> using the 2.8.x branch after it has had a couple bugfix releases. I don't
>> think it is useful or possible to try for an "extra stable" 2.x.0.
>> >
>> > The arguments about supporting the most widely used versions of runner
>> backends apply regardless of LTS. We should support them if we have the
>> resources to do so.
>> >
>> > Kenn
>> >
>> > On Tue, Oct 9, 2018 at 4:57 PM Ahmet Altay  wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Oct 5, 2018 at 4:38 AM, Jean-Baptiste Onofré 
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I think we have to remember what it's a LTS. A LTS is clearly a branch
>> >>> that we guarantee to have fixes on it for a long period of time.
>> >>>
>> >>>
>> >>> It doesn't mean that LTS == unique release. We can do a bunch of
>> >>> releases on a LTS branch, the only constraint is to avoid to introduce
>> >>> breaking changes.
>> >>
>> >>
>> >> I agree with this perspective. Thank you for sharing this. However the
>> other commenters also had a good point. Requiring 

Re: Splitting the repo

2018-10-10 Thread Romain Manni-Bucau
Le mer. 10 oct. 2018 à 14:59, Maximilian Michels  a écrit :

> Hi,
>
> I agree that splitting up Beam into separate repositories would cause
> more pain than gain.
>
> To a large degree we already have independent modules, e.g. runners/* or
> sdks/*. Although this is not the case for the core. It would be
> desirable to break it up further.
>

Think this part is ok for everyone.


>
>  > possibly even with their own build system (unified only through a
>  > top-level "build everything" script that descends into each subdir and
>  > runs the appropriate command).
>
> This is almost what we have. Yes, there are some dependencies on the
> Beam Gradle Plugin, but even if we had completely independent build
> directories, you'd still want to have a shared config/tasks across the
> projects (which might bring you back to a setup similar to what we have).
>
> One of the pain points seems to be the portability which "polluted" some
> parts of the project (e.g. legacy Runners). As mentioned in this thread
> that could have been solved with an abstraction. But the lack of
> abstraction also forced us to adopt the portable pipeline code quicker.
>

Not at all. Assume we have a full build which is doing portability then 3
concurrent builds (go, python, java)
then we have "current step" in the CI but the dev are never affected by
that and the build does not mess up their machines as well.

Today the main blocker is that default "profile" (script) is not matching
dev persona and therefore there is no real hope to have external
contributions
outside google related guys as mentionned by previous ficgures which is sad
for a project promishing unification and work between communities IMHO.


>
> -Max
>
> On 10.10.18 10:51, Romain Manni-Bucau wrote:
> > Yep for the split
> >
> > For the clean point it is quite linked to the build tools and fake env
> > for not native modules for the build tool (go for gradle which is java
> > first for instance). This is why having a real build which is natural
> > per language would be beneficial IMO.
> >
> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré  > > a écrit :
> >
> > Correct, it's more "module splitting" than repositories indeed.
> >
> > Regards
> > JB
> >
> > On 10/10/2018 10:35, Robert Bradshaw wrote:
> >  > Gotcha. So this is more about dividing the code (particularly
> > core) into
> >  > finer modules, rather than splitting the modules into separate
> >  > repositories, right?
> >  >
> >  > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
> > mailto:j...@nanthrax.net>
> >  > >> wrote:
> >  >
> >  > The purpose is that we have a monolithic core today mostly
> > providing
> >  > abstract classes.
> >  >
> >  > The idea is to have something more API oriented with
> > interface/SPI.
> >  >
> >  > Our users would then be able to pick the part of the core
> > they want,
> >  > resulting with lighter artifacts, and for us, it gives a more
> > flexible
> >  > approach.
> >  >
> >  > Regards
> >  > JB
> >  >
> >  > On 10/10/2018 10:26, Robert Bradshaw wrote:
> >  > > My question was not whether we should split the repo, but
> why?
> >  > (Dividing
> >  > > things into more (or fewer) modules withing a single repo
> is a
> >  > separate
> >  > > question.) Maybe I'm just not following what you mean by
> > "more API
> >  > > oriented." It would force stabler APIs.
> >  > >
> >  > > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
> >  > mailto:j...@nanthrax.net>
> > >
> >  > > 
> >  >  > >
> >  > > Hi,
> >  > >
> >  > > +1, even I think we could split the core even deeper.
> >  > >
> >  > > I discussed with Luke and Reuven to introduce core-sql,
> >  > core-schema,
> >  > > core-sdf, ...
> >  > >
> >  > > It's not a huge effort, and would allow us to move
> > forward on
> >  > Beam "more
> >  > > API oriented" approach.
> >  > >
> >  > > Regards
> >  > > JB
> >  > >
> >  > > On 10/10/2018 10:12, Robert Bradshaw wrote:
> >  > > > Hi everyone,
> >  > > >
> >  > > > While IMHO it's too early to even be able to split
> > the repo,
> >  > it's
> >  > > not to
> >  > > > early to talk about it, and I wanted to spin this off
> to
> >  > keep the
> >  > > other
> >  > > > thread focused.
> >  > > >
> >  >  

Re: Fwd: Slack invitation

2018-10-10 Thread Jean-Baptiste Onofré
You didn't receive it ?

Let me try another time.

Regards
JB

Le 10 oct. 2018 à 17:15, à 17:15, "Filip Popić"  a écrit:
>Any news regarding invitation?
>
>On Mon, 8 Oct 2018 at 17:24, Jean-Baptiste Onofré 
>wrote:
>
>> Ok I will send it to you as well.
>>
>> Regards
>> JB
>> Le 8 oct. 2018, à 18:23, Emmanuel Bastien  a écrit:
>>>
>>> Hello,
>>> I would like to join the Beam Slack channel. Could someone send me
>an
>>> invitation?
>>> Thanks in advance!
>>> Emmanuel
>>>
>>>


Re: Fwd: Slack invitation

2018-10-10 Thread Filip Popić
Any news regarding invitation?

On Mon, 8 Oct 2018 at 17:24, Jean-Baptiste Onofré  wrote:

> Ok I will send it to you as well.
>
> Regards
> JB
> Le 8 oct. 2018, à 18:23, Emmanuel Bastien  a écrit:
>>
>> Hello,
>> I would like to join the Beam Slack channel. Could someone send me an
>> invitation?
>> Thanks in advance!
>> Emmanuel
>>
>>


Re: Splitting the repo

2018-10-10 Thread Maximilian Michels

Hi,

I agree that splitting up Beam into separate repositories would cause 
more pain than gain.


To a large degree we already have independent modules, e.g. runners/* or 
sdks/*. Although this is not the case for the core. It would be 
desirable to break it up further.


> possibly even with their own build system (unified only through a
> top-level "build everything" script that descends into each subdir and
> runs the appropriate command).

This is almost what we have. Yes, there are some dependencies on the 
Beam Gradle Plugin, but even if we had completely independent build 
directories, you'd still want to have a shared config/tasks across the 
projects (which might bring you back to a setup similar to what we have).


One of the pain points seems to be the portability which "polluted" some 
parts of the project (e.g. legacy Runners). As mentioned in this thread 
that could have been solved with an abstraction. But the lack of 
abstraction also forced us to adopt the portable pipeline code quicker.


-Max

On 10.10.18 10:51, Romain Manni-Bucau wrote:

Yep for the split

For the clean point it is quite linked to the build tools and fake env 
for not native modules for the build tool (go for gradle which is java 
first for instance). This is why having a real build which is natural 
per language would be beneficial IMO.


Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré > a écrit :


Correct, it's more "module splitting" than repositories indeed.

Regards
JB

On 10/10/2018 10:35, Robert Bradshaw wrote:
 > Gotcha. So this is more about dividing the code (particularly
core) into
 > finer modules, rather than splitting the modules into separate
 > repositories, right?
 >
 > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
mailto:j...@nanthrax.net>
 > >> wrote:
 >
 >     The purpose is that we have a monolithic core today mostly
providing
 >     abstract classes.
 >
 >     The idea is to have something more API oriented with
interface/SPI.
 >
 >     Our users would then be able to pick the part of the core
they want,
 >     resulting with lighter artifacts, and for us, it gives a more
flexible
 >     approach.
 >
 >     Regards
 >     JB
 >
 >     On 10/10/2018 10:26, Robert Bradshaw wrote:
 >     > My question was not whether we should split the repo, but why?
 >     (Dividing
 >     > things into more (or fewer) modules withing a single repo is a
 >     separate
 >     > question.) Maybe I'm just not following what you mean by
"more API
 >     > oriented." It would force stabler APIs.
 >     >
 >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
 >     mailto:j...@nanthrax.net>
>
 >     > 
     >
 >     >     Hi,
 >     >
 >     >     +1, even I think we could split the core even deeper.
 >     >
 >     >     I discussed with Luke and Reuven to introduce core-sql,
 >     core-schema,
 >     >     core-sdf, ...
 >     >
 >     >     It's not a huge effort, and would allow us to move
forward on
 >     Beam "more
 >     >     API oriented" approach.
 >     >
 >     >     Regards
 >     >     JB
 >     >
 >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
 >     >     > Hi everyone,
 >     >     >
 >     >     > While IMHO it's too early to even be able to split
the repo,
 >     it's
 >     >     not to
 >     >     > early to talk about it, and I wanted to spin this off to
 >     keep the
 >     >     other
 >     >     > thread focused.
 >     >     >
 >     >     > In particular, I am trying to figure out exactly what is
 >     hoped to be
 >     >     > gained by splitting things up. In my experience, a single
 >     project that
 >     >     > spans multiple repos has always come with excessive
overhead
 >     and pain.
 >     >     > Of note, we recently merged the website and
dataflow-worker
 >     into the
 >     >     > main repo *exactly* to avoid this pain (though the
latter was
 >     >     > particularly bad due to one of the repos being private).
 >     >     >
 >     >     > If need be, I don't see any reason we can't have a single
 >     repo with
 >     >     > directories
 >     >     >
 >     >     > model/
 >     >     > website/
 >     >     > java/
 >     >     > go/
 >     >     > ...
 >     >     >
 >     >     > possibly even with their own build system (unified only
 >     through a
 >     >     > top-level "build everything" 

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread David Morávek
Hello Max,

It would be great if you can do more of a "general" review, the code base
is fairly large, well tested and it was already reviewed internally by
several people.

We would like to have the overall approach and design decisions validated
by the community and get some inputs on what could be improved and if we
are headed the right direction.

Thanks,
David

On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels  wrote:

> That is a huge PR! :) Euphoria looks great. Especially for people coming
> from Flink/Spark. I'll check out the documentation.
>
> Do you have any specific code parts which you want to have reviewed?
>
> Thanks,
> Max
>
> On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
> > Hi,
> >
> > Thanks for all the work you are doing on this DSL !
> >
> > I tried to follow the features branch for a while. I'm still committed
> > to  move forward on that front,  but more reviewers would be great.
> >
> > Regards
> > JB
> >
> > On 10/10/2018 10:26, Plajt, Vaclav wrote:
> >> Hello Beam devs,
> >> we finished our main goals in development of Euphoria DSL. It is Easy to
> >> use Java 8 API build on top of the Beam's Java SDK. API provides a
> >> high-level abstraction of data transformations, with focus on the Java 8
> >> language features (e.g. lambdas and streams). It is fully inter-operable
> >> with existing Beam SDK and convertible back and forth. It allows fast
> >> prototyping through use of (optional) Kryo based coders and can be
> >> seamlessly integrated into existing Beam Pipelines.
> >>
> >> Now we believe that it is the time to start discussion about it with the
> >> community. Which will hopefully lead to vote about adapting it into
> >> Apache Beam project. Most of main ideas and development goals were
> >> presented in Beam Summit in London [1].
> >>
> >> We are looking for reviewers within the community. Please start with
> >> documentation [2] or design document [3]. Our contribution is divided to
> >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
> >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code base
> >> remains untouched.
> >> All the checks in MR [5] are passing with exception of "Website
> >> PreCommit". Which seems to be broken, little help here would be
> appreciated.
> >>
> >> Thank you
> >> We are looking forward for your feedback.
> >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
> >>
> >> Resources:
> >> [1] Beam Summit London presentation:
> >>
> https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
> >> [2] Documentation:
> >>
> https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
> >> [3] Design Document: https://s.apache.org/beam-euphoria
> >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
> >> [5] Pull Request: https://github.com/apache/beam/pull/6601
> >> [6] Original proposal:
> >>
> http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e
> >>
> >>
> >>
> >> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
> >> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
> >> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
> >> občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
> >> nás a jak vystupuje za společnost a kdo může co a jak podepsat naleznete
> >> zde 
> >>
> >> You should know that this e-mail and its attachments are confidential.
> >> If we are negotiating on the conclusion of a transaction, we reserve the
> >> right to terminate the negotiations at any time. For fans of legalese—we
> >> hereby exclude the provisions of the Civil Code on pre-contractual
> >> liability. The rules about who and how may act for the company and what
> >> are the signing procedures can be found here
> >> .
> >
>


Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Maximilian Michels
That is a huge PR! :) Euphoria looks great. Especially for people coming 
from Flink/Spark. I'll check out the documentation.


Do you have any specific code parts which you want to have reviewed?

Thanks,
Max

On 10.10.18 10:30, Jean-Baptiste Onofré wrote:

Hi,

Thanks for all the work you are doing on this DSL !

I tried to follow the features branch for a while. I'm still committed
to  move forward on that front,  but more reviewers would be great.

Regards
JB

On 10/10/2018 10:26, Plajt, Vaclav wrote:

Hello Beam devs,
we finished our main goals in development of Euphoria DSL. It is Easy to
use Java 8 API build on top of the Beam's Java SDK. API provides a
high-level abstraction of data transformations, with focus on the Java 8
language features (e.g. lambdas and streams). It is fully inter-operable
with existing Beam SDK and convertible back and forth. It allows fast
prototyping through use of (optional) Kryo based coders and can be
seamlessly integrated into existing Beam Pipelines.

Now we believe that it is the time to start discussion about it with the
community. Which will hopefully lead to vote about adapting it into
Apache Beam project. Most of main ideas and development goals were
presented in Beam Summit in London [1].

We are looking for reviewers within the community. Please start with
documentation [2] or design document [3]. Our contribution is divided to
two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
`org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code base
remains untouched.
All the checks in MR [5] are passing with exception of "Website
PreCommit". Which seems to be broken, little help here would be appreciated.

Thank you
We are looking forward for your feedback.
{david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz

Resources:
[1] Beam Summit London presentation:
https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
[2] Documentation:
https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
[3] Design Document: https://s.apache.org/beam-euphoria
[4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
[5] Pull Request: https://github.com/apache/beam/pull/6601
[6] Original proposal:
http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e



Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
nás a jak vystupuje za společnost a kdo může co a jak podepsat naleznete
zde 

You should know that this e-mail and its attachments are confidential.
If we are negotiating on the conclusion of a transaction, we reserve the
right to terminate the negotiations at any time. For fans of legalese—we
hereby exclude the provisions of the Civil Code on pre-contractual
liability. The rules about who and how may act for the company and what
are the signing procedures can be found here
.




Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-10 Thread Robert Bradshaw
On Wed, Oct 10, 2018 at 9:37 AM Ismaël Mejía  wrote:

> The simplest thing we can do is just to pin all the deps of the LTS
> and not move them in any maintenance release if not a strong reason to
> do so.
>
> The next subject is to make maintainers aware of which release will be
> the LTS in advance so they decide what to do with the dependencies
> versions. In my previous mail I mentioned all the possible cases that
> can happen with dependencies and it is clear that one unified policy
> won’t satisfy every one. So better let the maintainers (who can also
> ask for user feedback in the ML) to decide about  versions before the
> release.
>
> Alexey’s question is still a really important issue, and has been so
> far ignored. What happens with the ‘Experimental’ APIs in the LTS.
> Options are:
>
> (1) We keep consistent with Experimental which means that they are
> still not guarantees (note that this does not mean that they will be
> broken arbitrarily).
> (2) We are consistent with the LTS approach which makes them ‘non
> experimental’ for the LTS so we will guarantee the functionality/API
> stable.
>
> I personally have conflicted opinions I would like to favor (1) but
> this is not consistent with the whole idea of LTS so probably (2) is
> wiser.
>

Yeah, I think (2) is the only viable option.


> Finally I also worry about Tim’s remarks on performance and quality,
> even if some of these things effectively can be fixed in a subsequent
> LTS release. Users will probably prefer a LTS to start with Beam and
> if the performance/quality of the LTS, this can hurt perception of the
> project.
>

Yes, for this reason I think it's important to consider what goes into an
LTS as well as what happens after. Almost by definition, using an LTS is
choosing stability over cutting edge features. I'd rather major feature X
goes in after LTS, and lives in a couple of releases gaining fixes and
improvements before being released as part of the next LTS, than quickly
making it into an LTS while brand new (both due to the time period before
we refine it, and the extra work of porting refinements back).

Or maybe LTS-users are unlikely to pick up a x.y.0 release anyway, waiting
for at least x.y.1?

Come to think of it, do we even have to designate releases at LTS at the
time of release? Perhaps we could instead just keep with our normal release
cadence, and periodically choose one of them as LTS once we've confirmed
its stability out in the wild and start backporting to it. (I can think of
several cons with this approach as well, e.g. generally it's easier to
backport bugfixes at the time a bugfix is done in master rather than
swapping in context at a later date, but just thought I'd throw it out
there.)

On Wed, Oct 10, 2018 at 4:53 AM Kenneth Knowles  wrote:
> >
> > I've seen two mentions that "rushing" is contrary to the goals of LTS.
> But I wouldn't worry about this. The fact is there is almost nothing you
> can do to stabilize *prior* to cutting the LTS branch. Stability comes from
> the branch being long-lived and having multiple releases.
> >
> > (I think this is pretty much my version of what JB is saying)
> >
> > What a conservative user will do if 2.8.x is declared LTS is to start
> using the 2.8.x branch after it has had a couple bugfix releases. I don't
> think it is useful or possible to try for an "extra stable" 2.x.0.
> >
> > The arguments about supporting the most widely used versions of runner
> backends apply regardless of LTS. We should support them if we have the
> resources to do so.
> >
> > Kenn
> >
> > On Tue, Oct 9, 2018 at 4:57 PM Ahmet Altay  wrote:
> >>
> >>
> >>
> >> On Fri, Oct 5, 2018 at 4:38 AM, Jean-Baptiste Onofré 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I think we have to remember what it's a LTS. A LTS is clearly a branch
> >>> that we guarantee to have fixes on it for a long period of time.
> >>>
> >>>
> >>> It doesn't mean that LTS == unique release. We can do a bunch of
> >>> releases on a LTS branch, the only constraint is to avoid to introduce
> >>> breaking changes.
> >>
> >>
> >> I agree with this perspective. Thank you for sharing this. However the
> other commenters also had a good point. Requiring users to upgrade their
> runner version maybe incompatible with the goals of an LTS branch. Ideally
> the fixes here should be very minimal and targeted.
> >>
> >>>
> >>>
> >>> So, IMHO, the key part is not release, it's branch.
> >>>
> >>> The first thing to decide is the branch.
> >>>
> >>> Instead of talking about 2.8.0 or 2.9.0, I would prepare a 2.8.x LTS
> >>> branch. It's a branch where we will cherry-pick some important fixes in
> >>> the future and where we will cut release. It's the approach I use in
> >>> other Apache projects (especially Karaf) and it works fine.
> >>
> >>
> >> JB, does Karaf has a documented process that we can re-use? If not
> could you explain a bit more?
> >>
> >> Is the proposal here to prepare 2.8.x LTS branch and make a 2.8.0
> release out of that?
> 

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Robert Bradshaw
Some rough stats (because I was curious): The gradle files have been edited
by ~79 unique contributors over 696 distinct commits, whereas the maven
ones were edited (over a longer time period) by ~130 unique
contributors over 1389 commits [1]. This doesn't capture how much effort
was put into these edits, but neither is restricted to a small set of
experts.

Regarding "friendly for other languages" I don't think either is
necessarily easy to learn, but my impression is that the maven learning
curve shallower for those already firmly embedded in the Java ecosystem
(perhaps due to leveraging existing familiarity, and perhaps some due to
the implicit java-centric conventions that maven assumed about your
project), whereas with gradle at least I could keep pulling on the string
to unwind things to the bottom. The "I just want to build/test X without
editing/viewing the build files" seemed more natural with Gradle (e.g. I
can easily list all tasks).

That being said, I don't think everyone needs to understand the full build
system. It's important that there be a critical mass that do (we have that
for both, and if we can simplify to improve this that'd be great), it's
easy enough to do basic changes (e.g. add a dependency, again I don't think
the barrier is sufficiently different for either), and works well out of
the box for someone who just wants to look up a command on the website and
edit code (the CLI is an improvement with Gradle, but it's clear that
(java) IDE support is a significant regression).

Personally, I don't know much about IDE configuration (admittedly the
larger issue), but one action item I can take on is trying to eliminate the
need to do a "git clean" after building certain targets (assuming I can
reproduce this).

Perhaps we should go through and prioritize (and add missing items to)
BEAM-4045
https://issues.apache.org/jira/issues/?jql=parent%20%3D%20BEAM-4045%20ORDER%20BY%20priority%20DESC
? There's always a long tail with this kind of thing, and looking at the
whole list can be daunting, but putting it in the correct order and
knocking off the top N items could possibly go a long way.

- Robert

[1] The commands I ran were (with and without the uniq)

$ find . -name 'build.gradle' | xargs git log | grep Author: | grep -o '[^<
]*@' | sort | uniq | wc
$ find . -name 'pom.xml' | xargs git log | grep Author: | grep -o '[^< ]*@'
| sort | uniq | wc

On Wed, Oct 10, 2018 at 10:31 AM Etienne Chauchot 
wrote:

> Hi all,
> I must admit that I agree on the status especially regarding 2 points:
> 1. new contributors obstacles: gradle learning curve might be too long for
> spare-time contributors, also complex scripted build takes time to
> understand comparing to self-descriptive one.
> 2. IDE integration kind of slows down development.
>
> Now, regarding how we improve the situation, I think we need to discuss
> and identify tasks and tackle them all together even if they are not sexy
> tasks as Ismaël mentioned.
>
> Etienne
>
> Le mardi 09 octobre 2018 à 10:04 +0200, Jean-Baptiste Onofré a écrit :
>
> Hi guys,
>
>
> I know that's a hot topic, but I have to bring this discussion on the table.
>
>
> Some months ago, we discussed about migrating our build from Maven to
>
> Gradle. One of the key expected improvement was the time to build.
>
> We proposed to do a PoC to evaluate the impacts and improvements, but
>
> this PoC was actually directly a migrate on master.
>
>
> Now, I would like to bring facts here:
>
>
> 1. Build time
>
> On my machine, the build time is roughly 1h15. It's pretty long, and
>
> regarding what the build is doing, I don't see huge improvement provided
>
> by Gradle.
>
> 2. Build reliability
>
> Even worse, most of the time, we need to use --no-parallel and
>
> --no-daemon to have a reliable build (it's basically recommended for
>
> release). It has an impact on build time, and we loose part of Gradle
>
> benefits.
>
> 3. Release and repositories
>
> Even if couple of releases has been performed with Gradle, it's not
>
> obvious to see improvements around artifacts handling. I got my
>
> repository polluted twice (that's part of the trick Gradle is doing to
>
> speed up the build dealing around the repository).
>
> 4. IDE integration
>
> We already had some comments on the mailing lists about the IDE
>
> integration. Clearly, the situation is not good on that front too. The
>
> integration on IDE (especially IntelliJ) is not good enough right now.
>
>
> We are working hard to grow up the community, and from a contributor
>
> perspective, our build system is not good today IMHO.
>
> As a contributor, I resumed my work on some PRs, and I'm spending so
>
> much time of the build, largely more than working on the PRs code itself.
>
>
> So, obviously, the situation is not perfect, at least from a contributor
>
> perspective.
>
>
> The purpose of this thread is not again to have a bunch of replied
>
> ending nowhere. I would like to be more "pushy" and let's try to be
>
> 

Re: Splitting the repo

2018-10-10 Thread Romain Manni-Bucau
This looks functionnal whereas the split is more about languages and making
the build smooth and efficient to work with to get back up to speed.
Runners can stay in java land/subproject while they are not in other
languages for instance so the api between core and runner can stay as it
for that topic.

Le mer. 10 oct. 2018 11:58, Robert Bradshaw  a écrit :

> On Wed, Oct 10, 2018 at 10:25 AM Romain Manni-Bucau 
> wrote:
>
>> On the split point: a mono-repo works for me as well. The main point is
>> "N separate builds".
>>
>> On the portable thing: currently runner integrates with portable api. It
>> impacts all runner. The needed code is the same everywhere since it is
>> mainly a DoFn at the end (a bit caricatural but that is the big picture) so
>> at the end the portable impl can be unique and built in top of any runner.
>> The gains are:
>>
>> 1. Dont pollute java users
>> 2. Single code maintenance
>> 3. Support to upgrade the runner without changing this layer (contract
>> based integration - vs coupled one - so smoother updates in all layers)
>> 4. Simpler code (at least in design)
>>
>> Hooe it is clearer
>>
>
> Right now the basic structure is
>
>   SDK
>   \
> [PortabilityAPI]
>   /
>   Beam Runners Core Library
>   \
> [BeamRunnersCoreAPI]
>   /
>   Beam RunnerX Adapter Code
>   \
> [RunnerXAPI]
>   /
>   Java RunnerX
>
> Where the APIs in brackets are what are used for the various components to
> talk to each other, and the later two are in Java. It sounds like what
> you're advocating for is the (Java) Beam Runners Core Library (along with
> its API). Am I understanding correctly? Of course some things are easier to
> abstract away than others (e.g. how SDK processes, if not in process, are
> launched (including staging their dependencies) and monitored is squarely
> in the domain of the particular runner, though we can abstract as much
> common, helper code as possible to higher levels).
>
>
>
>> Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré  a
>> écrit :
>>
>>> Hi,
>>>
>>> +1, even I think we could split the core even deeper.
>>>
>>> I discussed with Luke and Reuven to introduce core-sql, core-schema,
>>> core-sdf, ...
>>>
>>> It's not a huge effort, and would allow us to move forward on Beam "more
>>> API oriented" approach.
>>>
>>> Regards
>>> JB
>>>
>>> On 10/10/2018 10:12, Robert Bradshaw wrote:
>>> > Hi everyone,
>>> >
>>> > While IMHO it's too early to even be able to split the repo, it's not
>>> to
>>> > early to talk about it, and I wanted to spin this off to keep the other
>>> > thread focused.
>>> >
>>> > In particular, I am trying to figure out exactly what is hoped to be
>>> > gained by splitting things up. In my experience, a single project that
>>> > spans multiple repos has always come with excessive overhead and pain.
>>> > Of note, we recently merged the website and dataflow-worker into the
>>> > main repo *exactly* to avoid this pain (though the latter was
>>> > particularly bad due to one of the repos being private).
>>> >
>>> > If need be, I don't see any reason we can't have a single repo with
>>> > directories
>>> >
>>> > model/
>>> > website/
>>> > java/
>>> > go/
>>> > ...
>>> >
>>> > possibly even with their own build system (unified only through a
>>> > top-level "build everything" script that descends into each subdir and
>>> > runs the appropriate command). I'm not saying we should do this (there
>>> > is value in having a single consistent build system, etc.) but it's
>>> > possible. We could probably even make separate releases out of this
>>> > single repo (if we wanted, though given that our releases are
>>> time-based
>>> > rather than feature-based, I don't see much advantage here).
>>> >
>>> > Also, there was the comment.
>>> >
>>> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>> > mailto:rmannibu...@gmail.com>> wrote:
>>> >>
>>> >> Side note: beam portability would be saner if added on top of others
>>> > than the opposite which is done today.
>>> >
>>> > I think you brought this up before, Romain. I'm still trying to wrap my
>>> > head around what you mean here. Could you elaborate what such a
>>> > structure would look like?
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>


Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
On Wed, Oct 10, 2018 at 10:25 AM Romain Manni-Bucau 
wrote:

> On the split point: a mono-repo works for me as well. The main point is "N
> separate builds".
>
> On the portable thing: currently runner integrates with portable api. It
> impacts all runner. The needed code is the same everywhere since it is
> mainly a DoFn at the end (a bit caricatural but that is the big picture) so
> at the end the portable impl can be unique and built in top of any runner.
> The gains are:
>
> 1. Dont pollute java users
> 2. Single code maintenance
> 3. Support to upgrade the runner without changing this layer (contract
> based integration - vs coupled one - so smoother updates in all layers)
> 4. Simpler code (at least in design)
>
> Hooe it is clearer
>

Right now the basic structure is

  SDK
  \
[PortabilityAPI]
  /
  Beam Runners Core Library
  \
[BeamRunnersCoreAPI]
  /
  Beam RunnerX Adapter Code
  \
[RunnerXAPI]
  /
  Java RunnerX

Where the APIs in brackets are what are used for the various components to
talk to each other, and the later two are in Java. It sounds like what
you're advocating for is the (Java) Beam Runners Core Library (along with
its API). Am I understanding correctly? Of course some things are easier to
abstract away than others (e.g. how SDK processes, if not in process, are
launched (including staging their dependencies) and monitored is squarely
in the domain of the particular runner, though we can abstract as much
common, helper code as possible to higher levels).



> Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré  a
> écrit :
>
>> Hi,
>>
>> +1, even I think we could split the core even deeper.
>>
>> I discussed with Luke and Reuven to introduce core-sql, core-schema,
>> core-sdf, ...
>>
>> It's not a huge effort, and would allow us to move forward on Beam "more
>> API oriented" approach.
>>
>> Regards
>> JB
>>
>> On 10/10/2018 10:12, Robert Bradshaw wrote:
>> > Hi everyone,
>> >
>> > While IMHO it's too early to even be able to split the repo, it's not to
>> > early to talk about it, and I wanted to spin this off to keep the other
>> > thread focused.
>> >
>> > In particular, I am trying to figure out exactly what is hoped to be
>> > gained by splitting things up. In my experience, a single project that
>> > spans multiple repos has always come with excessive overhead and pain.
>> > Of note, we recently merged the website and dataflow-worker into the
>> > main repo *exactly* to avoid this pain (though the latter was
>> > particularly bad due to one of the repos being private).
>> >
>> > If need be, I don't see any reason we can't have a single repo with
>> > directories
>> >
>> > model/
>> > website/
>> > java/
>> > go/
>> > ...
>> >
>> > possibly even with their own build system (unified only through a
>> > top-level "build everything" script that descends into each subdir and
>> > runs the appropriate command). I'm not saying we should do this (there
>> > is value in having a single consistent build system, etc.) but it's
>> > possible. We could probably even make separate releases out of this
>> > single repo (if we wanted, though given that our releases are time-based
>> > rather than feature-based, I don't see much advantage here).
>> >
>> > Also, there was the comment.
>> >
>> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>> > mailto:rmannibu...@gmail.com>> wrote:
>> >>
>> >> Side note: beam portability would be saner if added on top of others
>> > than the opposite which is done today.
>> >
>> > I think you brought this up before, Romain. I'm still trying to wrap my
>> > head around what you mean here. Could you elaborate what such a
>> > structure would look like?
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: Splitting the repo

2018-10-10 Thread Romain Manni-Bucau
Yep for the split

For the clean point it is quite linked to the build tools and fake env for
not native modules for the build tool (go for gradle which is java first
for instance). This is why having a real build which is natural per
language would be beneficial IMO.

Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré  a écrit :

> Correct, it's more "module splitting" than repositories indeed.
>
> Regards
> JB
>
> On 10/10/2018 10:35, Robert Bradshaw wrote:
> > Gotcha. So this is more about dividing the code (particularly core) into
> > finer modules, rather than splitting the modules into separate
> > repositories, right?
> >
> > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré  > > wrote:
> >
> > The purpose is that we have a monolithic core today mostly providing
> > abstract classes.
> >
> > The idea is to have something more API oriented with interface/SPI.
> >
> > Our users would then be able to pick the part of the core they want,
> > resulting with lighter artifacts, and for us, it gives a more
> flexible
> > approach.
> >
> > Regards
> > JB
> >
> > On 10/10/2018 10:26, Robert Bradshaw wrote:
> > > My question was not whether we should split the repo, but why?
> > (Dividing
> > > things into more (or fewer) modules withing a single repo is a
> > separate
> > > question.) Maybe I'm just not following what you mean by "more API
> > > oriented." It would force stabler APIs.
> > >
> > > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
> > mailto:j...@nanthrax.net>
> > > >> wrote:
> > >
> > > Hi,
> > >
> > > +1, even I think we could split the core even deeper.
> > >
> > > I discussed with Luke and Reuven to introduce core-sql,
> > core-schema,
> > > core-sdf, ...
> > >
> > > It's not a huge effort, and would allow us to move forward on
> > Beam "more
> > > API oriented" approach.
> > >
> > > Regards
> > > JB
> > >
> > > On 10/10/2018 10:12, Robert Bradshaw wrote:
> > > > Hi everyone,
> > > >
> > > > While IMHO it's too early to even be able to split the repo,
> > it's
> > > not to
> > > > early to talk about it, and I wanted to spin this off to
> > keep the
> > > other
> > > > thread focused.
> > > >
> > > > In particular, I am trying to figure out exactly what is
> > hoped to be
> > > > gained by splitting things up. In my experience, a single
> > project that
> > > > spans multiple repos has always come with excessive overhead
> > and pain.
> > > > Of note, we recently merged the website and dataflow-worker
> > into the
> > > > main repo *exactly* to avoid this pain (though the latter was
> > > > particularly bad due to one of the repos being private).
> > > >
> > > > If need be, I don't see any reason we can't have a single
> > repo with
> > > > directories
> > > >
> > > > model/
> > > > website/
> > > > java/
> > > > go/
> > > > ...
> > > >
> > > > possibly even with their own build system (unified only
> > through a
> > > > top-level "build everything" script that descends into each
> > subdir and
> > > > runs the appropriate command). I'm not saying we should do
> > this (there
> > > > is value in having a single consistent build system, etc.)
> > but it's
> > > > possible. We could probably even make separate releases out
> > of this
> > > > single repo (if we wanted, though given that our releases are
> > > time-based
> > > > rather than feature-based, I don't see much advantage here).
> > > >
> > > > Also, there was the comment.
> > > >
> > > > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > > > mailto:rmannibu...@gmail.com>
> > >
> > > 
> >  wrote:
> > > >>
> > > >> Side note: beam portability would be saner if added on top
> > of others
> > > > than the opposite which is done today.
> > > >
> > > > I think you brought this up before, Romain. I'm still trying
> to
> > > wrap my
> > > > head around what you mean here. Could you elaborate what
> such a
> > > > structure would look like?
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org 
> > >
> > > http://blog.nanthrax.net
> > >  

Re: Splitting the repo

2018-10-10 Thread Jean-Baptiste Onofré
Correct, it's more "module splitting" than repositories indeed.

Regards
JB

On 10/10/2018 10:35, Robert Bradshaw wrote:
> Gotcha. So this is more about dividing the code (particularly core) into
> finer modules, rather than splitting the modules into separate
> repositories, right? 
> 
> On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré  > wrote:
> 
> The purpose is that we have a monolithic core today mostly providing
> abstract classes.
> 
> The idea is to have something more API oriented with interface/SPI.
> 
> Our users would then be able to pick the part of the core they want,
> resulting with lighter artifacts, and for us, it gives a more flexible
> approach.
> 
> Regards
> JB
> 
> On 10/10/2018 10:26, Robert Bradshaw wrote:
> > My question was not whether we should split the repo, but why?
> (Dividing
> > things into more (or fewer) modules withing a single repo is a
> separate
> > question.) Maybe I'm just not following what you mean by "more API
> > oriented." It would force stabler APIs. 
> >
> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>
> > >> wrote:
> >
> >     Hi,
> >
> >     +1, even I think we could split the core even deeper.
> >
> >     I discussed with Luke and Reuven to introduce core-sql,
> core-schema,
> >     core-sdf, ...
> >
> >     It's not a huge effort, and would allow us to move forward on
> Beam "more
> >     API oriented" approach.
> >
> >     Regards
> >     JB
> >
> >     On 10/10/2018 10:12, Robert Bradshaw wrote:
> >     > Hi everyone,
> >     >
> >     > While IMHO it's too early to even be able to split the repo,
> it's
> >     not to
> >     > early to talk about it, and I wanted to spin this off to
> keep the
> >     other
> >     > thread focused.
> >     >
> >     > In particular, I am trying to figure out exactly what is
> hoped to be
> >     > gained by splitting things up. In my experience, a single
> project that
> >     > spans multiple repos has always come with excessive overhead
> and pain.
> >     > Of note, we recently merged the website and dataflow-worker
> into the
> >     > main repo *exactly* to avoid this pain (though the latter was
> >     > particularly bad due to one of the repos being private).
> >     >
> >     > If need be, I don't see any reason we can't have a single
> repo with
> >     > directories
> >     >
> >     > model/
> >     > website/
> >     > java/
> >     > go/
> >     > ...
> >     >
> >     > possibly even with their own build system (unified only
> through a
> >     > top-level "build everything" script that descends into each
> subdir and
> >     > runs the appropriate command). I'm not saying we should do
> this (there
> >     > is value in having a single consistent build system, etc.)
> but it's
> >     > possible. We could probably even make separate releases out
> of this
> >     > single repo (if we wanted, though given that our releases are
> >     time-based
> >     > rather than feature-based, I don't see much advantage here).
> >     >
> >     > Also, there was the comment.
> >     >
> >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> >     > mailto:rmannibu...@gmail.com>
> >
> >     
>  >     >>
> >     >> Side note: beam portability would be saner if added on top
> of others
> >     > than the opposite which is done today.
> >     >
> >     > I think you brought this up before, Romain. I'm still trying to
> >     wrap my
> >     > head around what you mean here. Could you elaborate what such a
> >     > structure would look like? 
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbono...@apache.org 
> >
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
On Wed, Oct 10, 2018 at 10:35 AM Romain Manni-Bucau 
wrote:

> Also we can get a more adapted build tool by area and not break the repo
> for each build. Go and python build always need a git clean for java users
> which is a big issue so let's build each subproject - that is what beam is
> today - as they should with an adapted tool.
>

If this is the case, that should be fixed. I can't remember the last time I
did a git clean, so clearly things are not working as well for you as for
I.


> It requires very few validations byt it is trivial to add unit tests to
> ensure it is not broken on these contact points.
>
> Le mer. 10 oct. 2018 11:29, Jean-Baptiste Onofré  a
> écrit :
>
>> The purpose is that we have a monolithic core today mostly providing
>> abstract classes.
>>
>> The idea is to have something more API oriented with interface/SPI.
>>
>> Our users would then be able to pick the part of the core they want,
>> resulting with lighter artifacts, and for us, it gives a more flexible
>> approach.
>>
>> Regards
>> JB
>>
>> On 10/10/2018 10:26, Robert Bradshaw wrote:
>> > My question was not whether we should split the repo, but why? (Dividing
>> > things into more (or fewer) modules withing a single repo is a separate
>> > question.) Maybe I'm just not following what you mean by "more API
>> > oriented." It would force stabler APIs.
>> >
>> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré > > > wrote:
>> >
>> > Hi,
>> >
>> > +1, even I think we could split the core even deeper.
>> >
>> > I discussed with Luke and Reuven to introduce core-sql, core-schema,
>> > core-sdf, ...
>> >
>> > It's not a huge effort, and would allow us to move forward on Beam
>> "more
>> > API oriented" approach.
>> >
>> > Regards
>> > JB
>> >
>> > On 10/10/2018 10:12, Robert Bradshaw wrote:
>> > > Hi everyone,
>> > >
>> > > While IMHO it's too early to even be able to split the repo, it's
>> > not to
>> > > early to talk about it, and I wanted to spin this off to keep the
>> > other
>> > > thread focused.
>> > >
>> > > In particular, I am trying to figure out exactly what is hoped to
>> be
>> > > gained by splitting things up. In my experience, a single project
>> that
>> > > spans multiple repos has always come with excessive overhead and
>> pain.
>> > > Of note, we recently merged the website and dataflow-worker into
>> the
>> > > main repo *exactly* to avoid this pain (though the latter was
>> > > particularly bad due to one of the repos being private).
>> > >
>> > > If need be, I don't see any reason we can't have a single repo
>> with
>> > > directories
>> > >
>> > > model/
>> > > website/
>> > > java/
>> > > go/
>> > > ...
>> > >
>> > > possibly even with their own build system (unified only through a
>> > > top-level "build everything" script that descends into each
>> subdir and
>> > > runs the appropriate command). I'm not saying we should do this
>> (there
>> > > is value in having a single consistent build system, etc.) but
>> it's
>> > > possible. We could probably even make separate releases out of
>> this
>> > > single repo (if we wanted, though given that our releases are
>> > time-based
>> > > rather than feature-based, I don't see much advantage here).
>> > >
>> > > Also, there was the comment.
>> > >
>> > > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>> > > mailto:rmannibu...@gmail.com>
>> > >>
>> wrote:
>> > >>
>> > >> Side note: beam portability would be saner if added on top of
>> others
>> > > than the opposite which is done today.
>> > >
>> > > I think you brought this up before, Romain. I'm still trying to
>> > wrap my
>> > > head around what you mean here. Could you elaborate what such a
>> > > structure would look like?
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbono...@apache.org 
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
Gotcha. So this is more about dividing the code (particularly core) into
finer modules, rather than splitting the modules into separate
repositories, right?

On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré 
wrote:

> The purpose is that we have a monolithic core today mostly providing
> abstract classes.
>
> The idea is to have something more API oriented with interface/SPI.
>
> Our users would then be able to pick the part of the core they want,
> resulting with lighter artifacts, and for us, it gives a more flexible
> approach.
>
> Regards
> JB
>
> On 10/10/2018 10:26, Robert Bradshaw wrote:
> > My question was not whether we should split the repo, but why? (Dividing
> > things into more (or fewer) modules withing a single repo is a separate
> > question.) Maybe I'm just not following what you mean by "more API
> > oriented." It would force stabler APIs.
> >
> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré  > > wrote:
> >
> > Hi,
> >
> > +1, even I think we could split the core even deeper.
> >
> > I discussed with Luke and Reuven to introduce core-sql, core-schema,
> > core-sdf, ...
> >
> > It's not a huge effort, and would allow us to move forward on Beam
> "more
> > API oriented" approach.
> >
> > Regards
> > JB
> >
> > On 10/10/2018 10:12, Robert Bradshaw wrote:
> > > Hi everyone,
> > >
> > > While IMHO it's too early to even be able to split the repo, it's
> > not to
> > > early to talk about it, and I wanted to spin this off to keep the
> > other
> > > thread focused.
> > >
> > > In particular, I am trying to figure out exactly what is hoped to
> be
> > > gained by splitting things up. In my experience, a single project
> that
> > > spans multiple repos has always come with excessive overhead and
> pain.
> > > Of note, we recently merged the website and dataflow-worker into
> the
> > > main repo *exactly* to avoid this pain (though the latter was
> > > particularly bad due to one of the repos being private).
> > >
> > > If need be, I don't see any reason we can't have a single repo with
> > > directories
> > >
> > > model/
> > > website/
> > > java/
> > > go/
> > > ...
> > >
> > > possibly even with their own build system (unified only through a
> > > top-level "build everything" script that descends into each subdir
> and
> > > runs the appropriate command). I'm not saying we should do this
> (there
> > > is value in having a single consistent build system, etc.) but it's
> > > possible. We could probably even make separate releases out of this
> > > single repo (if we wanted, though given that our releases are
> > time-based
> > > rather than feature-based, I don't see much advantage here).
> > >
> > > Also, there was the comment.
> > >
> > > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > > mailto:rmannibu...@gmail.com>
> > >>
> wrote:
> > >>
> > >> Side note: beam portability would be saner if added on top of
> others
> > > than the opposite which is done today.
> > >
> > > I think you brought this up before, Romain. I'm still trying to
> > wrap my
> > > head around what you mean here. Could you elaborate what such a
> > > structure would look like?
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Splitting the repo

2018-10-10 Thread Romain Manni-Bucau
Also we can get a more adapted build tool by area and not break the repo
for each build. Go and python build always need a git clean for java users
which is a big issue so let's build each subproject - that is what beam is
today - as they should with an adapted tool.

It requires very few validations byt it is trivial to add unit tests to
ensure it is not broken on these contact points.

Le mer. 10 oct. 2018 11:29, Jean-Baptiste Onofré  a écrit :

> The purpose is that we have a monolithic core today mostly providing
> abstract classes.
>
> The idea is to have something more API oriented with interface/SPI.
>
> Our users would then be able to pick the part of the core they want,
> resulting with lighter artifacts, and for us, it gives a more flexible
> approach.
>
> Regards
> JB
>
> On 10/10/2018 10:26, Robert Bradshaw wrote:
> > My question was not whether we should split the repo, but why? (Dividing
> > things into more (or fewer) modules withing a single repo is a separate
> > question.) Maybe I'm just not following what you mean by "more API
> > oriented." It would force stabler APIs.
> >
> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré  > > wrote:
> >
> > Hi,
> >
> > +1, even I think we could split the core even deeper.
> >
> > I discussed with Luke and Reuven to introduce core-sql, core-schema,
> > core-sdf, ...
> >
> > It's not a huge effort, and would allow us to move forward on Beam
> "more
> > API oriented" approach.
> >
> > Regards
> > JB
> >
> > On 10/10/2018 10:12, Robert Bradshaw wrote:
> > > Hi everyone,
> > >
> > > While IMHO it's too early to even be able to split the repo, it's
> > not to
> > > early to talk about it, and I wanted to spin this off to keep the
> > other
> > > thread focused.
> > >
> > > In particular, I am trying to figure out exactly what is hoped to
> be
> > > gained by splitting things up. In my experience, a single project
> that
> > > spans multiple repos has always come with excessive overhead and
> pain.
> > > Of note, we recently merged the website and dataflow-worker into
> the
> > > main repo *exactly* to avoid this pain (though the latter was
> > > particularly bad due to one of the repos being private).
> > >
> > > If need be, I don't see any reason we can't have a single repo with
> > > directories
> > >
> > > model/
> > > website/
> > > java/
> > > go/
> > > ...
> > >
> > > possibly even with their own build system (unified only through a
> > > top-level "build everything" script that descends into each subdir
> and
> > > runs the appropriate command). I'm not saying we should do this
> (there
> > > is value in having a single consistent build system, etc.) but it's
> > > possible. We could probably even make separate releases out of this
> > > single repo (if we wanted, though given that our releases are
> > time-based
> > > rather than feature-based, I don't see much advantage here).
> > >
> > > Also, there was the comment.
> > >
> > > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > > mailto:rmannibu...@gmail.com>
> > >>
> wrote:
> > >>
> > >> Side note: beam portability would be saner if added on top of
> others
> > > than the opposite which is done today.
> > >
> > > I think you brought this up before, Romain. I'm still trying to
> > wrap my
> > > head around what you mean here. Could you elaborate what such a
> > > structure would look like?
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Etienne Chauchot
Hi all,
I must admit that I  agree on the status especially regarding 2 points:
1. new contributors obstacles: gradle learning curve might be too long for 
spare-time contributors, also complex
scripted build takes time to understand comparing to self-descriptive one.
2. IDE integration kind of slows down development.

Now, regarding how we improve the situation, I think we need to discuss and 
identify tasks and tackle them all together
even if they are not sexy tasks as Ismaël mentioned.

Etienne 

Le mardi 09 octobre 2018 à 10:04 +0200, Jean-Baptiste Onofré a écrit :
> Hi guys,
> 
> I know that's a hot topic, but I have to bring this discussion on the table.
> 
> Some months ago, we discussed about migrating our build from Maven to
> Gradle. One of the key expected improvement was the time to build.
> We proposed to do a PoC to evaluate the impacts and improvements, but
> this PoC was actually directly a migrate on master.
> 
> Now, I would like to bring facts here:
> 
> 1. Build time
> On my machine, the build time is roughly 1h15. It's pretty long, and
> regarding what the build is doing, I don't see huge improvement provided
> by Gradle.
> 2. Build reliability
> Even worse, most of the time, we need to use --no-parallel and
> --no-daemon to have a reliable build (it's basically recommended for
> release). It has an impact on build time, and we loose part of Gradle
> benefits.
> 3. Release and repositories
> Even if couple of releases has been performed with Gradle, it's not
> obvious to see improvements around artifacts handling. I got my
> repository polluted twice (that's part of the trick Gradle is doing to
> speed up the build dealing around the repository).
> 4. IDE integration
> We already had some comments on the mailing lists about the IDE
> integration. Clearly, the situation is not good on that front too. The
> integration on IDE (especially IntelliJ) is not good enough right now.
> 
> We are working hard to grow up the community, and from a contributor
> perspective, our build system is not good today IMHO.
> As a contributor, I resumed my work on some PRs, and I'm spending so
> much time of the build, largely more than working on the PRs code itself.
> 
> So, obviously, the situation is not perfect, at least from a contributor
> perspective.
> 
> The purpose of this thread is not again to have a bunch of replied
> ending nowhere. I would like to be more "pushy" and let's try to be
> concrete. So basically, we only have two options:
> 
> 1. Improve the build, working hard on Gradle front. Not sure if it makes
> such sense from a contributor perspective, as Maven is really well known
> from most of contributors (and easier to start with IMHO).
> 2. Back on Maven. That's clearly my preferred approach. IDE integration
> is better, Maven is well known from the contributors as already said.
> The effort is not so huge. We tried to use Gradle, we don't have the
> expected results now, that's not a problem, it's part of a project lifetime.
> 
> Thoughts ?
> 
> Regards
> JB
> 
> 

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Jean-Baptiste Onofré
Hi,

Thanks for all the work you are doing on this DSL !

I tried to follow the features branch for a while. I'm still committed
to  move forward on that front,  but more reviewers would be great.

Regards
JB

On 10/10/2018 10:26, Plajt, Vaclav wrote:
> Hello Beam devs,
> we finished our main goals in development of Euphoria DSL. It is Easy to
> use Java 8 API build on top of the Beam's Java SDK. API provides a
> high-level abstraction of data transformations, with focus on the Java 8
> language features (e.g. lambdas and streams). It is fully inter-operable
> with existing Beam SDK and convertible back and forth. It allows fast
> prototyping through use of (optional) Kryo based coders and can be
> seamlessly integrated into existing Beam Pipelines.
> 
> Now we believe that it is the time to start discussion about it with the
> community. Which will hopefully lead to vote about adapting it into
> Apache Beam project. Most of main ideas and development goals were
> presented in Beam Summit in London [1].
> 
> We are looking for reviewers within the community. Please start with
> documentation [2] or design document [3]. Our contribution is divided to
> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code base
> remains untouched.
> All the checks in MR [5] are passing with exception of "Website
> PreCommit". Which seems to be broken, little help here would be appreciated.
> 
> Thank you
> We are looking forward for your feedback.
> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
> 
> Resources:
> [1] Beam Summit London presentation:
> https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
> [2] Documentation:
> https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
> [3] Design Document: https://s.apache.org/beam-euphoria
> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
> [5] Pull Request: https://github.com/apache/beam/pull/6601
> [6] Original proposal:
> http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e
> 
> 
> 
> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
> občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
> nás a jak vystupuje za společnost a kdo může co a jak podepsat naleznete
> zde 
> 
> You should know that this e-mail and its attachments are confidential.
> If we are negotiating on the conclusion of a transaction, we reserve the
> right to terminate the negotiations at any time. For fans of legalese—we
> hereby exclude the provisions of the Civil Code on pre-contractual
> liability. The rules about who and how may act for the company and what
> are the signing procedures can be found here
> .

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Splitting the repo

2018-10-10 Thread Jean-Baptiste Onofré
The purpose is that we have a monolithic core today mostly providing
abstract classes.

The idea is to have something more API oriented with interface/SPI.

Our users would then be able to pick the part of the core they want,
resulting with lighter artifacts, and for us, it gives a more flexible
approach.

Regards
JB

On 10/10/2018 10:26, Robert Bradshaw wrote:
> My question was not whether we should split the repo, but why? (Dividing
> things into more (or fewer) modules withing a single repo is a separate
> question.) Maybe I'm just not following what you mean by "more API
> oriented." It would force stabler APIs. 
> 
> On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré  > wrote:
> 
> Hi,
> 
> +1, even I think we could split the core even deeper.
> 
> I discussed with Luke and Reuven to introduce core-sql, core-schema,
> core-sdf, ...
> 
> It's not a huge effort, and would allow us to move forward on Beam "more
> API oriented" approach.
> 
> Regards
> JB
> 
> On 10/10/2018 10:12, Robert Bradshaw wrote:
> > Hi everyone,
> >
> > While IMHO it's too early to even be able to split the repo, it's
> not to
> > early to talk about it, and I wanted to spin this off to keep the
> other
> > thread focused.
> >
> > In particular, I am trying to figure out exactly what is hoped to be
> > gained by splitting things up. In my experience, a single project that
> > spans multiple repos has always come with excessive overhead and pain.
> > Of note, we recently merged the website and dataflow-worker into the
> > main repo *exactly* to avoid this pain (though the latter was
> > particularly bad due to one of the repos being private).
> >
> > If need be, I don't see any reason we can't have a single repo with
> > directories
> >
> > model/
> > website/
> > java/
> > go/
> > ...
> >
> > possibly even with their own build system (unified only through a
> > top-level "build everything" script that descends into each subdir and
> > runs the appropriate command). I'm not saying we should do this (there
> > is value in having a single consistent build system, etc.) but it's
> > possible. We could probably even make separate releases out of this
> > single repo (if we wanted, though given that our releases are
> time-based
> > rather than feature-based, I don't see much advantage here).
> >
> > Also, there was the comment.
> >
> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > mailto:rmannibu...@gmail.com>
> >> wrote:
> >>
> >> Side note: beam portability would be saner if added on top of others
> > than the opposite which is done today.
> >
> > I think you brought this up before, Romain. I'm still trying to
> wrap my
> > head around what you mean here. Could you elaborate what such a
> > structure would look like? 
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
My question was not whether we should split the repo, but why? (Dividing
things into more (or fewer) modules withing a single repo is a separate
question.) Maybe I'm just not following what you mean by "more API
oriented." It would force stabler APIs.

On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré 
wrote:

> Hi,
>
> +1, even I think we could split the core even deeper.
>
> I discussed with Luke and Reuven to introduce core-sql, core-schema,
> core-sdf, ...
>
> It's not a huge effort, and would allow us to move forward on Beam "more
> API oriented" approach.
>
> Regards
> JB
>
> On 10/10/2018 10:12, Robert Bradshaw wrote:
> > Hi everyone,
> >
> > While IMHO it's too early to even be able to split the repo, it's not to
> > early to talk about it, and I wanted to spin this off to keep the other
> > thread focused.
> >
> > In particular, I am trying to figure out exactly what is hoped to be
> > gained by splitting things up. In my experience, a single project that
> > spans multiple repos has always come with excessive overhead and pain.
> > Of note, we recently merged the website and dataflow-worker into the
> > main repo *exactly* to avoid this pain (though the latter was
> > particularly bad due to one of the repos being private).
> >
> > If need be, I don't see any reason we can't have a single repo with
> > directories
> >
> > model/
> > website/
> > java/
> > go/
> > ...
> >
> > possibly even with their own build system (unified only through a
> > top-level "build everything" script that descends into each subdir and
> > runs the appropriate command). I'm not saying we should do this (there
> > is value in having a single consistent build system, etc.) but it's
> > possible. We could probably even make separate releases out of this
> > single repo (if we wanted, though given that our releases are time-based
> > rather than feature-based, I don't see much advantage here).
> >
> > Also, there was the comment.
> >
> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > mailto:rmannibu...@gmail.com>> wrote:
> >>
> >> Side note: beam portability would be saner if added on top of others
> > than the opposite which is done today.
> >
> > I think you brought this up before, Romain. I'm still trying to wrap my
> > head around what you mean here. Could you elaborate what such a
> > structure would look like?
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Plajt, Vaclav
Hello Beam devs,
we finished our main goals in development of Euphoria DSL. It is Easy to use 
Java 8 API build on top of the Beam's Java SDK. API provides a high-level 
abstraction of data transformations, with focus on the Java 8 language features 
(e.g. lambdas and streams). It is fully inter-operable with existing Beam SDK 
and convertible back and forth. It allows fast prototyping through use of 
(optional) Kryo based coders and can be seamlessly integrated into existing 
Beam Pipelines.

Now we believe that it is the time to start discussion about it with the 
community. Which will hopefully lead to vote about adapting it into Apache Beam 
project. Most of main ideas and development goals were presented in Beam Summit 
in London [1].

We are looking for reviewers within the community. Please start with 
documentation [2] or design document [3]. Our contribution is divided to two 
modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and 
`org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code base remains 
untouched.
All the checks in MR [5] are passing with exception of "Website PreCommit". 
Which seems to be broken, little help here would be appreciated.

Thank you
We are looking forward for your feedback.
{david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz

Resources:
[1] Beam Summit London presentation: 
https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
[2] Documentation: 
https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
[3] Design Document: https://s.apache.org/beam-euphoria
[4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
[5] Pull Request: https://github.com/apache/beam/pull/6601
[6] Original proposal: 
http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e



Je dobré vedet, ze tento e-mail a prílohy jsou duverné. Pokud spolu jednáme o 
uzavrení obchodu, vyhrazujeme si právo nase jednání kdykoli ukoncit. Pro 
fanousky právní mluvy - vylucujeme tím ustanovení obcanského zákoníku o 
predsmluvní odpovednosti. Pravidla o tom, kdo u nás a jak vystupuje za 
spolecnost a kdo muze co a jak podepsat naleznete 
zde

You should know that this e-mail and its attachments are confidential. If we 
are negotiating on the conclusion of a transaction, we reserve the right to 
terminate the negotiations at any time. For fans of legalese-we hereby exclude 
the provisions of the Civil Code on pre-contractual liability. The rules about 
who and how may act for the company and what are the signing procedures can be 
found here.


Re: Splitting the repo

2018-10-10 Thread Romain Manni-Bucau
On the split point: a mono-repo works for me as well. The main point is "N
separate builds".

On the portable thing: currently runner integrates with portable api. It
impacts all runner. The needed code is the same everywhere since it is
mainly a DoFn at the end (a bit caricatural but that is the big picture) so
at the end the portable impl can be unique and built in top of any runner.
The gains are:

1. Dont pollute java users
2. Single code maintenance
3. Support to upgrade the runner without changing this layer (contract
based integration - vs coupled one - so smoother updates in all layers)
4. Simpler code (at least in design)

Hooe it is clearer

Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré  a écrit :

> Hi,
>
> +1, even I think we could split the core even deeper.
>
> I discussed with Luke and Reuven to introduce core-sql, core-schema,
> core-sdf, ...
>
> It's not a huge effort, and would allow us to move forward on Beam "more
> API oriented" approach.
>
> Regards
> JB
>
> On 10/10/2018 10:12, Robert Bradshaw wrote:
> > Hi everyone,
> >
> > While IMHO it's too early to even be able to split the repo, it's not to
> > early to talk about it, and I wanted to spin this off to keep the other
> > thread focused.
> >
> > In particular, I am trying to figure out exactly what is hoped to be
> > gained by splitting things up. In my experience, a single project that
> > spans multiple repos has always come with excessive overhead and pain.
> > Of note, we recently merged the website and dataflow-worker into the
> > main repo *exactly* to avoid this pain (though the latter was
> > particularly bad due to one of the repos being private).
> >
> > If need be, I don't see any reason we can't have a single repo with
> > directories
> >
> > model/
> > website/
> > java/
> > go/
> > ...
> >
> > possibly even with their own build system (unified only through a
> > top-level "build everything" script that descends into each subdir and
> > runs the appropriate command). I'm not saying we should do this (there
> > is value in having a single consistent build system, etc.) but it's
> > possible. We could probably even make separate releases out of this
> > single repo (if we wanted, though given that our releases are time-based
> > rather than feature-based, I don't see much advantage here).
> >
> > Also, there was the comment.
> >
> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > mailto:rmannibu...@gmail.com>> wrote:
> >>
> >> Side note: beam portability would be saner if added on top of others
> > than the opposite which is done today.
> >
> > I think you brought this up before, Romain. I'm still trying to wrap my
> > head around what you mean here. Could you elaborate what such a
> > structure would look like?
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Robert Bradshaw
On Wed, Oct 10, 2018 at 8:03 AM Jean-Baptiste Onofré 
wrote:

> Hi Robert,
>
> about your point about we never fully build the project, even if I
> agree, it's what we "sold" with Gradle.
> Because, with Maven you can also build a single module without problem.
>

Good incremental support for the edit/build/test cycle is critical, which
(for me at least) seems to have improved a lot.

One particularly painful scenario was "I want to test changes in module B
that depends on changes in module A."

So, if I don't get the argument about build time for single module: in
> that case, build speed was not a valid argument for Maven to Gradle topic.
>
> Again, I would like to emphasize about the contribution adoption.
>
> If you really think that documenting & improving the build system using
> Gradle, that's fine.
> About the effort to go back on Maven, I really think that it's do-able.
>
> Probably this thread will end nowhere and we will stay with Gradle,
> that's fair. I just hope that we won't have any "brake" on contribution.
>
> Again, don't get my wrong, I just wanted to bring the discussion on the
> table, no pressure, no harsh, just a fairly discussion and concern that
> I wanted to share ;)
>

I'm really glad you brought this up. Hopefully this will be a call to
action for those that sold Gradle to (finish) delivering on what was
promised. Or even if we went back to Maven, ended up in a state better than
what we could have had with Gradle (if we determined it wasn't fixable or
going to be fixed). It seems like most IDEs have support for gradle in
general, so I'd be inclined to agree with Kenn that it's an issue with how
we're using it rather than the tool itself. I'm less convinced it's up to
each module owner to clean this up locally (unless there's some good
pattern to follow). Recognizing we have a problem is the first step.


Re: Splitting the repo

2018-10-10 Thread Jean-Baptiste Onofré
Hi,

+1, even I think we could split the core even deeper.

I discussed with Luke and Reuven to introduce core-sql, core-schema,
core-sdf, ...

It's not a huge effort, and would allow us to move forward on Beam "more
API oriented" approach.

Regards
JB

On 10/10/2018 10:12, Robert Bradshaw wrote:
> Hi everyone,
> 
> While IMHO it's too early to even be able to split the repo, it's not to
> early to talk about it, and I wanted to spin this off to keep the other
> thread focused.
> 
> In particular, I am trying to figure out exactly what is hoped to be
> gained by splitting things up. In my experience, a single project that
> spans multiple repos has always come with excessive overhead and pain.
> Of note, we recently merged the website and dataflow-worker into the
> main repo *exactly* to avoid this pain (though the latter was
> particularly bad due to one of the repos being private).
> 
> If need be, I don't see any reason we can't have a single repo with
> directories
> 
> model/
> website/
> java/
> go/
> ...
> 
> possibly even with their own build system (unified only through a
> top-level "build everything" script that descends into each subdir and
> runs the appropriate command). I'm not saying we should do this (there
> is value in having a single consistent build system, etc.) but it's
> possible. We could probably even make separate releases out of this
> single repo (if we wanted, though given that our releases are time-based
> rather than feature-based, I don't see much advantage here).
> 
> Also, there was the comment.
> 
> On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> mailto:rmannibu...@gmail.com>> wrote:
>>
>> Side note: beam portability would be saner if added on top of others
> than the opposite which is done today.
> 
> I think you brought this up before, Romain. I'm still trying to wrap my
> head around what you mean here. Could you elaborate what such a
> structure would look like? 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Splitting the repo

2018-10-10 Thread Robert Bradshaw
Hi everyone,

While IMHO it's too early to even be able to split the repo, it's not to
early to talk about it, and I wanted to spin this off to keep the other
thread focused.

In particular, I am trying to figure out exactly what is hoped to be gained
by splitting things up. In my experience, a single project that spans
multiple repos has always come with excessive overhead and pain. Of note,
we recently merged the website and dataflow-worker into the main repo
*exactly* to avoid this pain (though the latter was particularly bad due to
one of the repos being private).

If need be, I don't see any reason we can't have a single repo with
directories

model/
website/
java/
go/
...

possibly even with their own build system (unified only through a top-level
"build everything" script that descends into each subdir and runs the
appropriate command). I'm not saying we should do this (there is value in
having a single consistent build system, etc.) but it's possible. We could
probably even make separate releases out of this single repo (if we wanted,
though given that our releases are time-based rather than feature-based, I
don't see much advantage here).

Also, there was the comment.

On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau 
wrote:
>
> Side note: beam portability would be saner if added on top of others than
the opposite which is done today.

I think you brought this up before, Romain. I'm still trying to wrap my
head around what you mean here. Could you elaborate what such a structure
would look like?


Re: Log output from Dataflow tests

2018-10-10 Thread Maximilian Michels

Would be great to provide access to Dataflow build logs.

In the meantime, could someone with access send me the logs for the job 
below?


https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_14_41_03-9578125971484804239?project=apache-beam-testing

Thanks,
Max

On 09.10.18 13:45, Maximilian Michels wrote:

Hi,

I'm debugging a test failure in Dataflow PostCommit. There are logs 
available which I can't access. Is it possible to be added to the 
apache-beam-testing project?


Thanks,
Max


Example:
==
FAIL: test_streaming_with_attributes 
(apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)

--
Traceback (most recent call last):
   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", 
line 175, in test_streaming_with_attributes

     self._test_streaming(with_attributes=True)
   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", 
line 167, in _test_streaming

     timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py", 
line 91, in run_pipeline

     result = p.run()
   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", 
line 416, in run

     return self.runner.run_pipeline(self)
   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", 
line 65, in run_pipeline

     hc_assert_that(self.result, pickler.loads(on_success_matcher))
AssertionError:
Expected: (Test pipeline expected terminated in state: RUNNING and 
Expected 2 messages.)

  but: Expected 2 messages. Got 0 messages. Diffs (item, count):
   Expected but not in actual: [(PubsubMessage(data001-seen, 
{'processed': 'IT'}), 1), (PubsubMessage(data002-seen, {'timestamp_out': 
'2018-07-11T02:02:50.149000Z', 'processed': 'IT'}), 1)]

   Unexpected: []
   Stripped attributes: ['id', 'timestamp']

 >> begin captured stdout << -
Found: 
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_14_41_03-9578125971484804239?project=apache-beam-testing. 



Re: Java > 8 support

2018-10-10 Thread Arif Kasim
Thanks for the clarification Ismaël.





*  •  **Arif Kasim*
*  • * Strategic Cloud Engineer
*  •  *Google, Inc.
  •  arifka...@google.com




On Wed, Oct 10, 2018 at 9:41 AM Ismaël Mejía  wrote:

> Just wanted to clarify, there is already a JIRA for ongoing work on
> Java 11 support.
> https://issues.apache.org/jira/browse/BEAM-2530
>
> I led the initial work on supporting what at the time was Java 9/10,
> so far the biggest blockers were around the ApiSurface tests (not at
> all compatible with these versions) but at the time we were at 5 tests
> from getting sdks/core passing. Notice also that the scope of this
> JIRA evolved to support only the LTS version (Java 11), and
> specifically to support only sdks/core + direct runner. Supporting all
> IOs or runners really is more a question of the dependencies working
> nicely with Java 11 so this will probably take long time. Also the
> idea so far does NOT include supporting the Java module system at all.
>
> I stopped working on this during the move to gradle because it was too
> hard to tackle both Java evolving and all the ongoing changes in the
> build system. If somebody in the community wants to contribute in this
> area it will be greatly appreciated, notice that all the work we did
> on the build system for this needs to be implemented now in gradle
> too.
> On Sat, Oct 6, 2018 at 5:55 PM Romain Manni-Bucau 
> wrote:
> >
> > @Reuven: bytebuddy by itself no but the way beam tries to inject the
> proxy class is. There are other strategies you can use in bytebuddy which
> work.
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
> >
> >
> > Le sam. 6 oct. 2018 à 17:51, Reuven Lax  a écrit :
> >>
> >> Romain, do you have any more details on the ByteBuddy incompatibility?
> Is ByteBuddy incompatible with the Java 11 JRE, or just with new language
> features?
> >>
> >> On Fri, Oct 5, 2018 at 10:20 AM Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
> >>>
> >>> Hi Arif,
> >>>
> >>> AFAIK bytebuddy code is not java 11 friendly otherwise it runs (but it
> means your pipeline is very very simple since it does not have a dofn ;))
> if your engine supports it. Also note that the modules not being named you
> can have to use some weird import names or even unstable ones if you want
> to use modules (but there is no real reason to do that yet in java).
> >>>
> >>> Romain Manni-Bucau
> >>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
> >>>
> >>>
> >>> Le ven. 5 oct. 2018 à 19:10, Arif Kasim  a
> écrit :
> 
>  Hello,
>  What's the status of java version > 8 support for beam? Thanks.
> 
>  -Arif.
>


Re: Java > 8 support

2018-10-10 Thread Ismaël Mejía
Just wanted to clarify, there is already a JIRA for ongoing work on
Java 11 support.
https://issues.apache.org/jira/browse/BEAM-2530

I led the initial work on supporting what at the time was Java 9/10,
so far the biggest blockers were around the ApiSurface tests (not at
all compatible with these versions) but at the time we were at 5 tests
from getting sdks/core passing. Notice also that the scope of this
JIRA evolved to support only the LTS version (Java 11), and
specifically to support only sdks/core + direct runner. Supporting all
IOs or runners really is more a question of the dependencies working
nicely with Java 11 so this will probably take long time. Also the
idea so far does NOT include supporting the Java module system at all.

I stopped working on this during the move to gradle because it was too
hard to tackle both Java evolving and all the ongoing changes in the
build system. If somebody in the community wants to contribute in this
area it will be greatly appreciated, notice that all the work we did
on the build system for this needs to be implemented now in gradle
too.
On Sat, Oct 6, 2018 at 5:55 PM Romain Manni-Bucau  wrote:
>
> @Reuven: bytebuddy by itself no but the way beam tries to inject the proxy 
> class is. There are other strategies you can use in bytebuddy which work.
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>
>
> Le sam. 6 oct. 2018 à 17:51, Reuven Lax  a écrit :
>>
>> Romain, do you have any more details on the ByteBuddy incompatibility? Is 
>> ByteBuddy incompatible with the Java 11 JRE, or just with new language 
>> features?
>>
>> On Fri, Oct 5, 2018 at 10:20 AM Romain Manni-Bucau  
>> wrote:
>>>
>>> Hi Arif,
>>>
>>> AFAIK bytebuddy code is not java 11 friendly otherwise it runs (but it 
>>> means your pipeline is very very simple since it does not have a dofn ;)) 
>>> if your engine supports it. Also note that the modules not being named you 
>>> can have to use some weird import names or even unstable ones if you want 
>>> to use modules (but there is no real reason to do that yet in java).
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>
>>>
>>> Le ven. 5 oct. 2018 à 19:10, Arif Kasim  a écrit :

 Hello,
 What's the status of java version > 8 support for beam? Thanks.

 -Arif.


Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-10 Thread Ismaël Mejía
The simplest thing we can do is just to pin all the deps of the LTS
and not move them in any maintenance release if not a strong reason to
do so.

The next subject is to make maintainers aware of which release will be
the LTS in advance so they decide what to do with the dependencies
versions. In my previous mail I mentioned all the possible cases that
can happen with dependencies and it is clear that one unified policy
won’t satisfy every one. So better let the maintainers (who can also
ask for user feedback in the ML) to decide about  versions before the
release.

Alexey’s question is still a really important issue, and has been so
far ignored. What happens with the ‘Experimental’ APIs in the LTS.
Options are:

(1) We keep consistent with Experimental which means that they are
still not guarantees (note that this does not mean that they will be
broken arbitrarily).
(2) We are consistent with the LTS approach which makes them ‘non
experimental’ for the LTS so we will guarantee the functionality/API
stable.

I personally have conflicted opinions I would like to favor (1) but
this is not consistent with the whole idea of LTS so probably (2) is
wiser.

Finally I also worry about Tim’s remarks on performance and quality,
even if some of these things effectively can be fixed in a subsequent
LTS release. Users will probably prefer a LTS to start with Beam and
if the performance/quality of the LTS, this can hurt perception of the
project.
On Wed, Oct 10, 2018 at 4:53 AM Kenneth Knowles  wrote:
>
> I've seen two mentions that "rushing" is contrary to the goals of LTS. But I 
> wouldn't worry about this. The fact is there is almost nothing you can do to 
> stabilize *prior* to cutting the LTS branch. Stability comes from the branch 
> being long-lived and having multiple releases.
>
> (I think this is pretty much my version of what JB is saying)
>
> What a conservative user will do if 2.8.x is declared LTS is to start using 
> the 2.8.x branch after it has had a couple bugfix releases. I don't think it 
> is useful or possible to try for an "extra stable" 2.x.0.
>
> The arguments about supporting the most widely used versions of runner 
> backends apply regardless of LTS. We should support them if we have the 
> resources to do so.
>
> Kenn
>
> On Tue, Oct 9, 2018 at 4:57 PM Ahmet Altay  wrote:
>>
>>
>>
>> On Fri, Oct 5, 2018 at 4:38 AM, Jean-Baptiste Onofré  
>> wrote:
>>>
>>> Hi,
>>>
>>> I think we have to remember what it's a LTS. A LTS is clearly a branch
>>> that we guarantee to have fixes on it for a long period of time.
>>>
>>>
>>> It doesn't mean that LTS == unique release. We can do a bunch of
>>> releases on a LTS branch, the only constraint is to avoid to introduce
>>> breaking changes.
>>
>>
>> I agree with this perspective. Thank you for sharing this. However the other 
>> commenters also had a good point. Requiring users to upgrade their runner 
>> version maybe incompatible with the goals of an LTS branch. Ideally the 
>> fixes here should be very minimal and targeted.
>>
>>>
>>>
>>> So, IMHO, the key part is not release, it's branch.
>>>
>>> The first thing to decide is the branch.
>>>
>>> Instead of talking about 2.8.0 or 2.9.0, I would prepare a 2.8.x LTS
>>> branch. It's a branch where we will cherry-pick some important fixes in
>>> the future and where we will cut release. It's the approach I use in
>>> other Apache projects (especially Karaf) and it works fine.
>>
>>
>> JB, does Karaf has a documented process that we can re-use? If not could you 
>> explain a bit more?
>>
>> Is the proposal here to prepare 2.8.x LTS branch and make a 2.8.0 release 
>> out of that?
>>
>>>
>>>
>>> Just my $0.01
>>>
>>> Regards
>>> JB
>>>
>>> On 05/10/2018 12:14, Robert Bradshaw wrote:
>>> > On Fri, Oct 5, 2018 at 3:59 AM Chamikara Jayalath >> > > wrote:
>>> >
>>> >
>>> > On Thu, Oct 4, 2018 at 9:39 AM Ahmet Altay >> > > wrote:
>>> >
>>> > I agree that LTS releases require more thought. Thank you for
>>> > raising these questions. What other open questions do we have
>>> > related LTS releases?
>>> >
>>> > One way to do this would be to add them to a particular tracking
>>> > list (e.g. 2.9.0 blocking list) that way we would be ready for
>>> > an LTS release ahead of time.
>>> >
>>> > Related to dependencies, I agree with Thomas. If we block on
>>> > waiting for dependencies, we may end up taking a long time
>>> > before making any LTS release. And the reality of Beam releases
>>> > right now is that there are no supported releases today that and
>>> > in the long term that might hurt user trust. In my opinion, we
>>> > need to fix that sooner rather than later.
>>> >
>>> >
>>> > Agree on the idea of focussing on stability instead of feature set
>>> > when it comes to LTS releases. Based on the previous discussion on
>>> > this [1] looks like 

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Ismaël Mejía
JB mentioned some factual points, I think most of the community
embraced the move to gradle with the best hopes about the promised
improvements, but with the perspective of time, it is clear that we
have not delivered on many of them. I can easily understand the
growing bitterness about the current status. There are two aspects
that really bother me:

1. Integration with IDEs is BAD and this not only has seriously
affected everybody’s daily work, but I think it does not help to
encourage new contributors.

2. Gradle has a ridiculously huge learning curve, first you need to
understand groovy and its intricacies, then gradle and its magic
around caches, daemons, etc and finally grasp a good chunk of our hand
tailored BeamModulePlugin configuration. I cannot see how the argument
that this is friendly for other language contributors holds.

A consequence of (2) is that so far I have barely seen any
contribution from any person apart of the ones who worked in the move
to gradle on improvements on the build. So we also lost contributions
in this area.

Now the question is how to solve this or improve the current
situation. I really have no clue of how we can improve the IDE
situation, (for me the most critical point), or how we can improve all
the pending points of BEAM-4045, but I can understand the apathy in
the subject, improving the build system is not sexy, and previous
contributors don’t feel encouraged to fix what wasn’t broken in the
past. In any case it is good that we are having an open discussion on
this.
On Wed, Oct 10, 2018 at 6:35 AM Kenneth Knowles  wrote:
>
> Here are some things I hear a lot:
>
> Beam technical decision making is too Java-centric
> Gradle is a lot less Java-centric than Maven
> Gradle is still a Java-centric tool, but at least it isn't so slow/wasteful
>
> I can understand long-time Java devs wanting to have their familiar and 
> dominant toolchain, with good tooling and integrations (ignoring any 
> objective merits of Gradle for Java dev). But for non-Java devs you are 
> meeting in the middle. Actually, you are not even in the middle - you are 
> still in Java land. For example, setup.py can do arbitrary things, so we 
> could use it to do the Java build! What do you think of trying this out? 
> (j/k).
>
> What I would like is to better support language-native development workflows, 
> and make sure Gradle is lightweight glue that is easy to use. Make the 
> configurations as obvious to read as we can, with as little hackery as 
> possible.
>
> --
>
> We can make Gradle a lot better (for the above, and for Java too). From my 
> work on the core of our Gradle support code, I can name a few issues that I'm 
> sure remain even after many months away from the code:
>
> 1. We were learning as we built it out. We don't always do things in the 
> natural way.
> 2. We made our own abstractions to streamline converting lots of modules 
> quickly. Clarity and efficiency of the build were not the primary concerns. 
> (to be clear: it was an amazing undertaking to get this done and the decision 
> was right for the moment)
> 2a. We tried to centralize a lot of policy, leading to the centralized bits 
> containing the union of all complexity of all modules (or maybe it is 
> multiplicative).
> 3. We tried very hard to match the mvn build exactly, rather than doing the 
> best thing in Gradle.
> 4. We've built a lot of imperative code for "telling Gradle what to do" and 
> that adds a lot of complexity compared with using Groovy as primarily a 
> configuration language.
> 5. We have turned on things like "always rebuild" when we don't know the 
> dependencies, rather than putting in the work to get the dependencies right.
>
> A lot of the above also makes it hard for IDEs to grok the config, since we 
> deviate from the "golden path" a lot.
>
> It would be awesome if module owners took on the task of making their modules 
> have an awesome incremental Gradle build.
>
> Kenn
>
> On Tue, Oct 9, 2018 at 3:38 AM Romain Manni-Bucau  
> wrote:
>>
>> For me the vendoring issue is ok cause it should belong to another shade 
>> loduke released with beam when needed. It is not an uncommon practise.
>>
>> Now the lack of IDE integration for tests/debug (using gradle runner is a 
>> workaround and still hurts by its slowness compared to native run) is a 
>> clear showstopper for me.
>>
>> Also, from a community perspective, gradle adoption is far to be mainstream 
>> (even spark is built with maven) so does not serve beam at the end.
>>
>> Maven build didnt have any issue except the duration AFAIK, gradle has 2 
>> blockers + several small drawbacks (custom build and no standard, no tooling 
>> without script execution, bad integration in enterprise chaines like 
>> security auditing etc). Overall gradle build is close to maven one - last 
>> time i tested it was within 15% so not worth it when you see the time you 
>> loose when developping anything. It is key to keep in mind jenkiks is 
>> 

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Romain Manni-Bucau
@Reuven: any idea what is missing? I don't expect it to be ready very
quickly but having 2 repos does not hurt that much if both are working
better than a single one so can be worth a try maybe?

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le mer. 10 oct. 2018 à 08:31, Reuven Lax  a écrit :

> Unrelated to the Maven/Gradle discussion, but I do somewhat agree with
> Romain that Beam could be split into separate repos, however I don't think
> it's quite ready yet. AFAIK the portability interfaces are still being
> modified, and until these interfaces are fixed it will be hard to split the
> project. Once those interfaces are fixed and mature, I completely agree we
> should then look into splitting the repos up.
>
> Reuven
>
> On Tue, Oct 9, 2018 at 10:35 PM Romain Manni-Bucau 
> wrote:
>
>>
>>
>>
>> Le mer. 10 oct. 2018 à 06:35, Kenneth Knowles  a écrit :
>>
>>> Here are some things I hear a lot:
>>>
>>>- Beam technical decision making is too Java-centric
>>>- Gradle is a lot less Java-centric than Maven
>>>
>>> Hmm isnt it the opposite being groovy based? Maven tends to have the
>> same set of plugins than gradle so it is a kind of status-quo if you check
>> technically and are factual.
>>
>>
>>>
>>>- Gradle is still a Java-centric tool, but at least it isn't so
>>>slow/wasteful
>>>
>>>
>> Not what I experienced and it seems JB had the same experience. Gradle is
>> comparable to maven in terms of full build and is way slower in Idea cause
>> of the lack of integration and support (by design).
>>
>>
>>> I can understand long-time Java devs wanting to have their familiar and
>>> dominant toolchain, with good tooling and integrations (ignoring any
>>> objective merits of Gradle for Java dev). But for non-Java devs you are
>>> meeting in the middle. Actually, you are not even in the middle - you are
>>> still in Java land. For example, setup.py can do arbitrary things, so we
>>> could use it to do the Java build! What do you think of trying this out?
>>> (j/k).
>>>
>>> What I would like is to better support language-native development
>>> workflows, and make sure Gradle is lightweight glue that is easy to use.
>>> Make the configurations as obvious to read as we can, with as little
>>> hackery as possible.
>>>
>>
>> Agree and think it can be time, now Beam python/go support become
>> something, to split in N repos. Release lifecycles are different, codebase
>> has no real link except the portability layer which can get its own repo so
>> probably worth a PoC:
>>
>> 1. beam-portability
>> 2. beam-java
>> 3. beam-go
>> 4. beam-python
>>
>> Side note: beam portability would be saner if added on top of others than
>> the opposite which is done today.
>>
>>
>>>
>>> --
>>>
>>> We can make Gradle a lot better (for the above, and for Java too). From
>>> my work on the core of our Gradle support code, I can name a few issues
>>> that I'm sure remain even after many months away from the code:
>>>
>>> 1. We were learning as we built it out. We don't always do things in the
>>> natural way.
>>> 2. We made our own abstractions to streamline converting lots of modules
>>> quickly. Clarity and efficiency of the build were not the primary concerns.
>>> (to be clear: it was an amazing undertaking to get this done and the
>>> decision was right for the moment)
>>> 2a. We tried to centralize a lot of policy, leading to the centralized
>>> bits containing the union of all complexity of all modules (or maybe it is
>>> multiplicative).
>>> 3. We tried very hard to match the mvn build exactly, rather than doing
>>> the best thing in Gradle.
>>> 4. We've built a lot of imperative code for "telling Gradle what to do"
>>> and that adds a lot of complexity compared with using Groovy as primarily a
>>> configuration language.
>>> 5. We have turned on things like "always rebuild" when we don't know the
>>> dependencies, rather than putting in the work to get the dependencies right.
>>>
>>> A lot of the above also makes it hard for IDEs to grok the config, since
>>> we deviate from the "golden path" a lot.
>>>
>>> It would be awesome if module owners took on the task of making their
>>> modules have an awesome incremental Gradle build.
>>>
>>
>> Well incremental build only matters when you run a full subpart of a job
>> and is something very fragile - check how in months it didnt happen. In
>> practise what's a dev workflow:
>>
>> 1. loop { dev in the IDE, run test }
>> 2. run the full module or build to validate nothing has been broken
>> 3. PR (+ back to 1 if comments)
>>
>> This means that incremental support is only relevant for jenkins where
>> the perf diff is not significative and where you can't use incremental
>> build 

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Reuven Lax
Unrelated to the Maven/Gradle discussion, but I do somewhat agree with
Romain that Beam could be split into separate repos, however I don't think
it's quite ready yet. AFAIK the portability interfaces are still being
modified, and until these interfaces are fixed it will be hard to split the
project. Once those interfaces are fixed and mature, I completely agree we
should then look into splitting the repos up.

Reuven

On Tue, Oct 9, 2018 at 10:35 PM Romain Manni-Bucau 
wrote:

>
>
>
> Le mer. 10 oct. 2018 à 06:35, Kenneth Knowles  a écrit :
>
>> Here are some things I hear a lot:
>>
>>- Beam technical decision making is too Java-centric
>>- Gradle is a lot less Java-centric than Maven
>>
>> Hmm isnt it the opposite being groovy based? Maven tends to have the same
> set of plugins than gradle so it is a kind of status-quo if you check
> technically and are factual.
>
>
>>
>>- Gradle is still a Java-centric tool, but at least it isn't so
>>slow/wasteful
>>
>>
> Not what I experienced and it seems JB had the same experience. Gradle is
> comparable to maven in terms of full build and is way slower in Idea cause
> of the lack of integration and support (by design).
>
>
>> I can understand long-time Java devs wanting to have their familiar and
>> dominant toolchain, with good tooling and integrations (ignoring any
>> objective merits of Gradle for Java dev). But for non-Java devs you are
>> meeting in the middle. Actually, you are not even in the middle - you are
>> still in Java land. For example, setup.py can do arbitrary things, so we
>> could use it to do the Java build! What do you think of trying this out?
>> (j/k).
>>
>> What I would like is to better support language-native development
>> workflows, and make sure Gradle is lightweight glue that is easy to use.
>> Make the configurations as obvious to read as we can, with as little
>> hackery as possible.
>>
>
> Agree and think it can be time, now Beam python/go support become
> something, to split in N repos. Release lifecycles are different, codebase
> has no real link except the portability layer which can get its own repo so
> probably worth a PoC:
>
> 1. beam-portability
> 2. beam-java
> 3. beam-go
> 4. beam-python
>
> Side note: beam portability would be saner if added on top of others than
> the opposite which is done today.
>
>
>>
>> --
>>
>> We can make Gradle a lot better (for the above, and for Java too). From
>> my work on the core of our Gradle support code, I can name a few issues
>> that I'm sure remain even after many months away from the code:
>>
>> 1. We were learning as we built it out. We don't always do things in the
>> natural way.
>> 2. We made our own abstractions to streamline converting lots of modules
>> quickly. Clarity and efficiency of the build were not the primary concerns.
>> (to be clear: it was an amazing undertaking to get this done and the
>> decision was right for the moment)
>> 2a. We tried to centralize a lot of policy, leading to the centralized
>> bits containing the union of all complexity of all modules (or maybe it is
>> multiplicative).
>> 3. We tried very hard to match the mvn build exactly, rather than doing
>> the best thing in Gradle.
>> 4. We've built a lot of imperative code for "telling Gradle what to do"
>> and that adds a lot of complexity compared with using Groovy as primarily a
>> configuration language.
>> 5. We have turned on things like "always rebuild" when we don't know the
>> dependencies, rather than putting in the work to get the dependencies right.
>>
>> A lot of the above also makes it hard for IDEs to grok the config, since
>> we deviate from the "golden path" a lot.
>>
>> It would be awesome if module owners took on the task of making their
>> modules have an awesome incremental Gradle build.
>>
>
> Well incremental build only matters when you run a full subpart of a job
> and is something very fragile - check how in months it didnt happen. In
> practise what's a dev workflow:
>
> 1. loop { dev in the IDE, run test }
> 2. run the full module or build to validate nothing has been broken
> 3. PR (+ back to 1 if comments)
>
> This means that incremental support is only relevant for jenkins where the
> perf diff is not significative and where you can't use incremental build
> cause you want to have a fully reliable build.
> So at the end the incremental build support is not that significative for
> end users and contibutors. The cost of not having the IDE support, however,
> is just a blocker.
>
> So my 2cts would be to stop trying to be good theorically at the cost of
> loosing the users and try to embrace a community driven choices approach.
>
>
>>
>> Kenn
>>
>> On Tue, Oct 9, 2018 at 3:38 AM Romain Manni-Bucau 
>> wrote:
>>
>>> For me the vendoring issue is ok cause it should belong to another shade
>>> loduke released with beam when needed. It is not an uncommon practise.
>>>
>>> Now the lack of IDE integration for tests/debug (using gradle runner is
>>> a 

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Jean-Baptiste Onofré
Hi Robert,

about your point about we never fully build the project, even if I
agree, it's what we "sold" with Gradle.
Because, with Maven you can also build a single module without problem.

So, if I don't get the argument about build time for single module: in
that case, build speed was not a valid argument for Maven to Gradle topic.

Again, I would like to emphasize about the contribution adoption.

If you really think that documenting & improving the build system using
Gradle, that's fine.
About the effort to go back on Maven, I really think that it's do-able.

Probably this thread will end nowhere and we will stay with Gradle,
that's fair. I just hope that we won't have any "brake" on contribution.

Again, don't get my wrong, I just wanted to bring the discussion on the
table, no pressure, no harsh, just a fairly discussion and concern that
I wanted to share ;)

Regards
JB

On 09/10/2018 12:22, Robert Bradshaw wrote:
> On Tue, Oct 9, 2018 at 10:04 AM Jean-Baptiste Onofré  > wrote:
> 
> Hi guys,
> 
> I know that's a hot topic, but I have to bring this discussion on
> the table.
> 
> 
> Thank you for bringing this up and revisiting it now that we have some
> experience. 
>  
> 
> Some months ago, we discussed about migrating our build from Maven to
> Gradle. One of the key expected improvement was the time to build.
> We proposed to do a PoC to evaluate the impacts and improvements, but
> this PoC was actually directly a migrate on master.
> 
> Now, I would like to bring facts here:
> 
> 1. Build time
> On my machine, the build time is roughly 1h15. It's pretty long, and
> regarding what the build is doing, I don't see huge improvement provided
> by Gradle.
> 
> 
> I rarely, if ever, build from scratch so perhaps I have not been
> impacted by this nearly as much. (In particular, build and test times
> seem to have gone way down for me, probably due to better incremental
> support, but that's just anecdotal.) 
> 
> Is this worse than it was on maven, or just not as much better as was
> hoped? 
>  
> 
> 2. Build reliability
> Even worse, most of the time, we need to use --no-parallel and
> --no-daemon to have a reliable build (it's basically recommended for
> release). It has an impact on build time, and we loose part of Gradle
> benefits.
> 
> 
> I think this is a matter of incorrect dependency declarations (and is
> not unique to gradle). I'd have loved to been able to go with a build
> system that simply didn't let you have incorrect dependency
> declarations, but that wasn't an option for other reasons. 
> 
> I wonder if there's some automatic tooling we could leverage to fix (and
> keep fixed) this. Regardless, this is unfinished work that remains to be
> done so we can realize the full benefits. 
>  
> 
> 3. Release and repositories
> Even if couple of releases has been performed with Gradle, it's not
> obvious to see improvements around artifacts handling. I got my
> repository polluted twice (that's part of the trick Gradle is doing to
> speed up the build dealing around the repository).
> 
> 
> Could you clarify what improvements we were expecting here? I thought
> the goal was that we could publish the same artifacts, with no regression. 
>  
> 
> 4. IDE integration
> We already had some comments on the mailing lists about the IDE
> integration. Clearly, the situation is not good on that front too. The
> integration on IDE (especially IntelliJ) is not good enough right now.
> 
> 
> This is important. To be honest, I had also issues back in the day
> getting the maven setup working well out of the box in IntelliJ and
> Eclipse (mostly with respect to things like shadowing and protobufs), so
> we shouldn't fall prey to the golden age fallacy. 
> 
> It seems the recent move to vendoring has caused more issues here; I'm
> not sure that would be fixed just moving back to maven (or how to
> resolve it going forward). 
> 
> On the other hand, just last week I set up a new computer according
> to https://cwiki.apache.org/confluence/display/BEAM/IntelliJ+Tips and
> that seems to be working fine. 
>  
> 
> We are working hard to grow up the community, and from a contributor
> perspective, our build system is not good today IMHO.
> As a contributor, I resumed my work on some PRs, and I'm spending so
> much time of the build, largely more than working on the PRs code
> itself.
> 
> So, obviously, the situation is not perfect, at least from a contributor
> perspective.
> 
> The purpose of this thread is not again to have a bunch of replied
> ending nowhere. I would like to be more "pushy" and let's try to be
> concrete. So basically, we only have two options:
> 
> 1. Improve the build, working hard on Gradle front. Not sure if it makes
> such sense from a contributor perspective, as Maven is really well known
> from most of