date:20181010

Re: Python 3: final step

2018-10-10 Thread Manu Zhang

Does anyone know how to set up python version on Jenkins ? It’s Python 3.5.2 
now.

Thanks,
Manu Zhang
On Oct 5, 2018, 9:24 AM +0800, Valentyn Tymofieiev , wrote:
> I have put together a guide [1] to help get started with investigating Python 
> 3-related test failures that may be helpful for new folks joining the effort.
>
> Comments and improvements welcome!
>
> Thanks,
> Valentyn
>
> [1] 
> https://docs.google.com/document/d/1s1BJVCY65LB_SYK1SU1u7NbZiFANoq-nEYaEvzRbYlA
>
>
> > On Thu, Oct 4, 2018 at 11:26 AM Valentyn Tymofieiev  
> > wrote:
> > > I agree there is some overlap between JIRAs that track individual 
> > > failures and module-level JIRAs. We originally wanted to do the 
> > > conversion on a module-by-module basis, however we learned that test 
> > > failures in some modules require changes in other modules, and it may be 
> > > a little easier to slice the problem if we focus on classes of failures.
> > >
> > > Module-level JIRAs can still be useful for tracking the end result: tox 
> > > suites cover all tests in the module in Py3 environment, and there are no 
> > > disabled tests in the module that don't have individual JIRAs tracking 
> > > them.
> > >
> > > I suggest that folks who are working on module-level JIRAs assign to 
> > > themselves the JIRAs that track individual failures if/when they are 
> > > actively addressing them. This way, unassigned problem-specific JIRAs can 
> > > use help from the community.
> > >
> > > Thanks,
> > > Valentyn
> > >
> > >
> > > > On Wed, Oct 3, 2018 at 8:14 PM Manu Zhang  
> > > > wrote:
> > > > > Thanks Valentyn. Note some test failing issues are covered by “Finish 
> > > > > Python 3 porting for *** module”, e.g. 
> > > > > https://issues.apache.org/jira/browse/BEAM-5315.
> > > > >
> > > > > Manu
> > > > > 在 2018年10月3日 +0800 PM4:18，Valentyn Tymofieiev 
> > > > > ，写道：
> > > > > > Hi Rakesh and Manu,
> > > > > >
> > > > > > Thanks to both of you for offering help (in different threads). 
> > > > > > It's great to see that more and more people get involved with 
> > > > > > helping to make Beam Python 3 compatible!
> > > > > >
> > > > > > There are a few PRs in flight, and several people in the community 
> > > > > > actively work on Python 3 support now. I would be happy to 
> > > > > > coordinate the work so that we don't step at each others toes and 
> > > > > > avoid duplication of effort.
> > > > > >
> > > > > > I recently looked at unit tests that are still failing in Python 3 
> > > > > > environment  and filed a few issues (within range BEAM-5615 - 
> > > > > > BEAM-5629), to track similar classes of errors. You can also find 
> > > > > > them on Kanban board [1].
> > > > > > In particular, BEAM-5620 and BEAM-5627 should be easy issues to get 
> > > > > > started.
> > > > > >
> > > > > > There are multiple ways you can help:
> > > > > > - Helping to rootcause errors. Even a comment why a test is failing 
> > > > > > and a suggestion how to fix it, will be helpful for others when you 
> > > > > > don't have time to do the fix.
> > > > > > - Helping with code reviews.
> > > > > > - Reporting new issues (as subtasks to BEAM-1251), deduplicating or 
> > > > > > splitting the existing issues. We probably don't want to file a 
> > > > > > Jira for each of 250+ currently failing tests at this point, but it 
> > > > > > may make sense to track the errors that occur repeatedly share the 
> > > > > > root cause.
> > > > > > - Fixing the issues. Feel free to assign an issue to yourself if 
> > > > > > you have a fix in mind and plan to actively work on it. Due to the 
> > > > > > nature of the problem it may occasionally happen that two issues 
> > > > > > share the rootcause, or fixing one issue is a prerequisite for 
> > > > > > fixing another issue, so sync to master often to make sure the 
> > > > > > issue you are working on is not already fixed.
> > > > > >
> > > > > > I'll also keep an eye on the PRs and will try to keep the list of 
> > > > > > open issues up to date.
> > > > > >
> > > > > > Thanks,
> > > > > > Valentyn
> > > > > >
> > > > > > [1]: 
> > > > > > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail
> > > > > >
> > > > > >
> > > > > > > On Tue, Oct 2, 2018 at 9:38 AM Pablo Estrada  
> > > > > > > wrote:
> > > > > > > > Very cool : ) I'm also available to review / merge if you need 
> > > > > > > > help from my side.
> > > > > > > > Best
> > > > > > > > -P.
> > > > > > > >
> > > > > > > > > On Tue, Oct 2, 2018 at 7:45 AM Rakesh Kumar 
> > > > > > > > >  wrote:
> > > > > > > > > > Hi Rob,
> > > > > > > > > >
> > > > > > > > > > I am, Rakesh Kumar, using Beam SDK for one of my projects 
> > > > > > > > > > at Lyft. I have been working closely with Thomas Weise. I 
> > > > > > > > > > have already met a couple of Python SDK developers in 
> > > > > > > > > > person.
> > > > > > > > > > I am interested to help migrate to Python 3. You can assign 
> > > > > > > > > > me PRs for review. I am also more than happy

BEAM-2953, Timeseries library

2018-10-10 Thread rarokni

RE: Pull Request : https://github.com/apache/beam/pull/6540

I have been doing some work on a generalized set of timeseries transforms, with
the goal to abstract the user from the process of dealing with some of the
common problems when working with timeseries in BEAM batch / stream mode.
Would love to get feedback, comments, ideas and I hope, after things flesh out
more, collaborators! Of course it will not cover all issues in the timeseries
problem space, but from many interactions and discussions over the last couple
of years, I feel it has the potential to help with a large enough set of use
cases to make it worthwhile endeavor.

Primary goals:
Remove as much "boilerplate" as possible form common timeseries pre-processing
tasks.
Deal with a couple of the harder problems with timeseries when processed as a
stream in a distributed system. Some example use cases (which we use state api
and timers to solve):
IOT : A device sends signals when something changes but nothing if there has
been no update to save battery. The absence of data downstream does not mean
that there is no information, it's just not been observed. (Of course it could
be the IOT device went boom.. but in the absence of new data, the last known
value is assumed until some ttl is reached).
Finance : Ticks in fx finance data will come with Ask and Bid prices as they
change, if however no ASK or BID price is seen the last known value is assumed.
Provide some common sinks as reference, for example output of Tensorflow
Sequence Examples onto storage systems. The initial sinks in the pull requests
are based on Google Cloud sinks, but this should be expanded to other platforms
I hope with the help of some of the good folks on this thread!

In order to make this a tractable problem, there are some fundamental
assumptions that have been made.

The raw timeseries data will translate to a common representation. The first
pass of this is below. Users main 'coding task' will be to convert their
objects to :

Single property
https://github.com/rezarokni/beam/blob/timeseries/sdks/java/extensions/timeseries/src/main/proto/TimeSeriesData.proto#L66

Multivariate:
https://github.com/rezarokni/beam/blob/timeseries/sdks/java/extensions/timeseries/src/main/proto/TimeSeriesData.proto#L75

The primary utility of this library is for stream processing. While it will
work fine in batch mode there are many already established tools for dealing
with timeseries data that has already landed in a data store.
This library is not intended as a data analytics tool, although the output of
the library has potential to be very useful within analytics tools it is a side
benefit.

Would be great to get feedback and if you are interested in helping more
directly please ping.

Cheers

Reza

Re: Beam Samza Runner status update

2018-10-10 Thread Jesse Anderson

Interesting

On Wed, Oct 10, 2018, 3:49 PM Kenneth Knowles  wrote:

> Welcome, Hai!
>
> On Wed, Oct 10, 2018 at 3:46 PM Hai Lu  wrote:
>
>> Hi, all
>>
>> This is Hai from LinkedIn. As Xinyu mentioned, I have been working on
>> portable API for Samza runner and made some solid progress. It's been a
>> very smooth process (although not effortless for sure) and I'm really
>> grateful for the great platform that you all have built. I'm very
>> impressed. Bravo!
>>
>> Excited to work with everyone on Beam. Do expect more questions from me
>> down the road.
>>
>> Thanks,
>> Hai
>>
>> On Wed, Oct 10, 2018 at 12:36 PM Kenneth Knowles  wrote:
>>
>>> Clarification: Thomas Groh wrote the fuser, not me!
>>>
>>> Thanks for the sharing all this. Really cool.
>>>
>>> Kenn
>>>
>>> On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:
>>>
 Thanks for sharing! it's so exciting to hear that Beam is being used on
 Samza in production @LinkedIn! Your feedback will be helpful to Beam
 community!

 Besides, Beam supports SQL right now and hopefully Beam community could
 also receive feedback on BeamSQL
  in the
 future.

 -Rui

 On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
 wrote:

> Thanks for sharing and congrats for this great work !
>
> Regards
> JB
> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
> target=_blank>xinyuliu...@gmail.com> a écrit:
>>
>> Hi, All,
>>
>> It's been over four months since we added the Samza Runner to Beam,
>> and we've been making a lot of progress after that. Here I would like to
>> update your guys and share some really good news happening here at 
>> LinkedIn:
>>
>> 1) First Beam job in production @LInkedIn!
>> After a few rounds of testing and benchmarking, we finally rolled out
>> our first Beam job here! The job uses quite a few features, such as event
>> time, fixed/session windowing, early triggering, and stateful processing.
>> Our first customer is very happy and they highly appraise the easy-to-use
>> Beam API as well as powerful processing model. Due to the limited 
>> resources
>> here, we put our full trust in the work you guys are doing, and we didn't
>> run into any surprises. We see extremely attention to details as well as
>> non-compromise in any user experience everywhere in the code base. We 
>> would
>> like to thank everyone in the Beam community to contribute to such an
>> amazing framework!
>>
>> 2) A portable Samza Runner prototype
>> We are also starting the work in making Samza Runner portable. So far
>> we just got the python word count example working using portable Samza
>> Runner. Please look out for the PR for this very soon :). Again, this 
>> work
>> is not possible without the great Beam portability framework, and the
>> developers like Luke and Ahmet, just to name a few, behind it. The
>> ReferenceRunner has been extremely useful to us to figure out what's 
>> needed
>> and how it works. Kudos to Thomas Groh, Ben Sidhom and all the others who
>> makes this available to us. And to Kenn, your fuse work rocks.
>>
>> 3) More contributors in Samza Runner
>> The runner has been Chris and my personal project for a while and now
>> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
>> contribute. Hai has been focusing on the portability work as mentioned in
>> #2, and Boris will work mostly on supporting our use cases. We will send
>> more emails discussing our use cases, like the "Update state after 
>> firing"
>> email I sent out earlier.
>>
>> Finally, a shout-out to our very own Chris Pettitt. Without you, none
>> of the above won't happen!
>>
>> Thanks,
>> Xinyu
>>
>

Re: Log output from Dataflow tests

2018-10-10 Thread Ankur Goenka

Hi Max, I don't have edit privileges for the project so can't modify user.

On Wed, Oct 10, 2018 at 9:02 AM Maximilian Michels  wrote:

> Thank you Scott! Ismael also sent me the logs and I could fix the error.
>
> It seems we have granted read-only access to project members in the
> past. I just checked back with Ankur, he might be able to grant access
> for my GCP account.
>
> -Max
>
> On 10.10.18 17:26, Scott Wegner wrote:
> > I'm not sure how apache-beam-testing permissions are managed; Kenn,
> > could we grant read-access for contributors who need it for testing?
> >
> > Here are two logs from the job that seem relevant:
> >
> > 2018-10-08 14:44:45.381 PDT
> > Parsing unknown args:
> > [u'--dataflowJobId=2018-10-08_14_41_03-9578125971484804239',
> > u'--autoscalingAlgorithm=NONE', u'--direct_runner_use_stacked_bundle',
> > u'--maxNumWorkers=0', u'--style=scrambled', u'--sleep_secs=20',
> > u'--pipeline_type_check',
> >
> u'--gcpTempLocation=gs://temp-storage-for-end-to-end-tests/temp-it/beamapp-jenkins-1008214058-522436.1539034858.522554',
>
> > u'--numWorkers=1',
> > u'--beam_plugins=apache_beam.io.filesystem.FileSystem',
> > u'--beam_plugins=apache_beam.io.hadoopfilesystem.HadoopFileSystem',
> > u'--beam_plugins=apache_beam.io.localfilesystem.LocalFileSystem',
> > u'--beam_plugins=apache_beam.io.gcp.gcsfilesystem.GCSFileSystem',
> > u'--beam_plugins=apache_beam.io.filesystem_test.TestingFileSystem',
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PipelineGraphRenderer',
>
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.MuteRenderer',
>
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.TextRenderer',
>
> >
> u'--beam_plugins=apache_beam.runners.interactive.display.pipeline_graph_renderer.PydotRenderer',
>
> >
> u'--pipelineUrl=gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1008214058-522436.1539034858.522554/pipeline.pb']
> >
> > 2018-10-08 14:44:45.382 PDT
> > Python sdk harness failed: Traceback (most recent call last): File
> >
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker_main.py",
>
> > line 133, in main
> > sdk_pipeline_options.get_all_options(drop_default=True)) File
> >
> "/usr/local/lib/python2.7/dist-packages/apache_beam/options/pipeline_options.py",
>
> > line 227, in get_all_options action='append' if num_times > 1 else
> > 'store') File "/usr/lib/python2.7/argparse.py", line 1308, in
> > add_argument return self._add_action(action) File
> > "/usr/lib/python2.7/argparse.py", line 1682, in _add_action
> > self._optionals._add_action(action) File
> > "/usr/lib/python2.7/argparse.py", line 1509, in _add_action action =
> > super(_ArgumentGroup, self)._add_action(action) File
> > "/usr/lib/python2.7/argparse.py", line 1322, in _add_action
> > self._check_conflict(action) File "/usr/lib/python2.7/argparse.py", line
> > 1460, in _check_conflict conflict_handler(action, confl_optionals) File
> > "/usr/lib/python2.7/argparse.py", line 1467, in _handle_conflict_error
> > raise ArgumentError(action, message % conflict_string) ArgumentError:
> > argument --beam_plugins: conflicting option string(s): --beam_plugins
> >
> > On Wed, Oct 10, 2018 at 1:05 AM Maximilian Michels  > > wrote:
> >
> > Would be great to provide access to Dataflow build logs.
> >
> > In the meantime, could someone with access send me the logs for the
> job
> > below?
> >
> >
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-08_14_41_03-9578125971484804239?project=apache-beam-testing
> >
> > Thanks,
> > Max
> >
> > On 09.10.18 13:45, Maximilian Michels wrote:
> >  > Hi,
> >  >
> >  > I'm debugging a test failure in Dataflow PostCommit. There are
> logs
> >  > available which I can't access. Is it possible to be added to the
> >  > apache-beam-testing project?
> >  >
> >  > Thanks,
> >  > Max
> >  >
> >  >
> >  > Example:
> >  >
> >
>  ==
> >  > FAIL: test_streaming_with_attributes
> >  > (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> >  >
> >
>  --
> >  > Traceback (most recent call last):
> >  >File
> >  >
> >
>  
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
> >
> >  > line 175, in test_streaming_with_attributes
> >  >  self._test_streaming(with_attributes=True)
> >  >File
> >  >
> >
>  
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
> >
> >  > line 167, in _test_streaming
> >  >  timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> >

Re: Java > 8 support

2018-10-10 Thread Pablo Estrada

Hello all,
If I understand you correctly Ismael, a good amount of
'beam-sdks-java-core' tests are already passing with Java 11, so the amount
of work necessary on the core module should be relatively small. Is this
correct? Are there improvements that may be missing in terms of
modularization?

There is also the work necessary to build/run tests with Gradle

I am also curious... how much work do you estimate is necessary to support
Java 11 with some of the existing sources? I understand that we have many,
many sources, but perhaps some of the more popular ones (e.g. TextIO)?

Thanks!
-P.

On Wed, Oct 10, 2018 at 12:59 AM Arif Kasim  wrote:

> Thanks for the clarification Ismaël.
>
>
>
>
>
> *  •  **Arif Kasim*
> *  • * Strategic Cloud Engineer
> *  •  *Google, Inc.
>   •  arifka...@google.com
>
>
>
>
> On Wed, Oct 10, 2018 at 9:41 AM Ismaël Mejía  wrote:
>
>> Just wanted to clarify, there is already a JIRA for ongoing work on
>> Java 11 support.
>> https://issues.apache.org/jira/browse/BEAM-2530
>>
>> I led the initial work on supporting what at the time was Java 9/10,
>> so far the biggest blockers were around the ApiSurface tests (not at
>> all compatible with these versions) but at the time we were at 5 tests
>> from getting sdks/core passing. Notice also that the scope of this
>> JIRA evolved to support only the LTS version (Java 11), and
>> specifically to support only sdks/core + direct runner. Supporting all
>> IOs or runners really is more a question of the dependencies working
>> nicely with Java 11 so this will probably take long time. Also the
>> idea so far does NOT include supporting the Java module system at all.
>>
>> I stopped working on this during the move to gradle because it was too
>> hard to tackle both Java evolving and all the ongoing changes in the
>> build system. If somebody in the community wants to contribute in this
>> area it will be greatly appreciated, notice that all the work we did
>> on the build system for this needs to be implemented now in gradle
>> too.
>> On Sat, Oct 6, 2018 at 5:55 PM Romain Manni-Bucau 
>> wrote:
>> >
>> > @Reuven: bytebuddy by itself no but the way beam tries to inject the
>> proxy class is. There are other strategies you can use in bytebuddy which
>> work.
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >
>> >
>> > Le sam. 6 oct. 2018 à 17:51, Reuven Lax  a écrit :
>> >>
>> >> Romain, do you have any more details on the ByteBuddy incompatibility?
>> Is ByteBuddy incompatible with the Java 11 JRE, or just with new language
>> features?
>> >>
>> >> On Fri, Oct 5, 2018 at 10:20 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>> >>>
>> >>> Hi Arif,
>> >>>
>> >>> AFAIK bytebuddy code is not java 11 friendly otherwise it runs (but
>> it means your pipeline is very very simple since it does not have a dofn
>> ;)) if your engine supports it. Also note that the modules not being named
>> you can have to use some weird import names or even unstable ones if you
>> want to use modules (but there is no real reason to do that yet in java).
>> >>>
>> >>> Romain Manni-Bucau
>> >>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >>>
>> >>>
>> >>> Le ven. 5 oct. 2018 à 19:10, Arif Kasim  a
>> écrit :
>> 
>>  Hello,
>>  What's the status of java version > 8 support for beam? Thanks.
>> 
>>  -Arif.
>>
>

Re: Beam Samza Runner status update

2018-10-10 Thread Kenneth Knowles

Welcome, Hai!

On Wed, Oct 10, 2018 at 3:46 PM Hai Lu  wrote:

> Hi, all
>
> This is Hai from LinkedIn. As Xinyu mentioned, I have been working on
> portable API for Samza runner and made some solid progress. It's been a
> very smooth process (although not effortless for sure) and I'm really
> grateful for the great platform that you all have built. I'm very
> impressed. Bravo!
>
> Excited to work with everyone on Beam. Do expect more questions from me
> down the road.
>
> Thanks,
> Hai
>
> On Wed, Oct 10, 2018 at 12:36 PM Kenneth Knowles  wrote:
>
>> Clarification: Thomas Groh wrote the fuser, not me!
>>
>> Thanks for the sharing all this. Really cool.
>>
>> Kenn
>>
>> On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:
>>
>>> Thanks for sharing! it's so exciting to hear that Beam is being used on
>>> Samza in production @LinkedIn! Your feedback will be helpful to Beam
>>> community!
>>>
>>> Besides, Beam supports SQL right now and hopefully Beam community could
>>> also receive feedback on BeamSQL
>>>  in the
>>> future.
>>>
>>> -Rui
>>>
>>> On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Thanks for sharing and congrats for this great work !

 Regards
 JB
 Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
 target=_blank>xinyuliu...@gmail.com> a écrit:
>
> Hi, All,
>
> It's been over four months since we added the Samza Runner to Beam,
> and we've been making a lot of progress after that. Here I would like to
> update your guys and share some really good news happening here at 
> LinkedIn:
>
> 1) First Beam job in production @LInkedIn!
> After a few rounds of testing and benchmarking, we finally rolled out
> our first Beam job here! The job uses quite a few features, such as event
> time, fixed/session windowing, early triggering, and stateful processing.
> Our first customer is very happy and they highly appraise the easy-to-use
> Beam API as well as powerful processing model. Due to the limited 
> resources
> here, we put our full trust in the work you guys are doing, and we didn't
> run into any surprises. We see extremely attention to details as well as
> non-compromise in any user experience everywhere in the code base. We 
> would
> like to thank everyone in the Beam community to contribute to such an
> amazing framework!
>
> 2) A portable Samza Runner prototype
> We are also starting the work in making Samza Runner portable. So far
> we just got the python word count example working using portable Samza
> Runner. Please look out for the PR for this very soon :). Again, this work
> is not possible without the great Beam portability framework, and the
> developers like Luke and Ahmet, just to name a few, behind it. The
> ReferenceRunner has been extremely useful to us to figure out what's 
> needed
> and how it works. Kudos to Thomas Groh, Ben Sidhom and all the others who
> makes this available to us. And to Kenn, your fuse work rocks.
>
> 3) More contributors in Samza Runner
> The runner has been Chris and my personal project for a while and now
> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
> contribute. Hai has been focusing on the portability work as mentioned in
> #2, and Boris will work mostly on supporting our use cases. We will send
> more emails discussing our use cases, like the "Update state after firing"
> email I sent out earlier.
>
> Finally, a shout-out to our very own Chris Pettitt. Without you, none
> of the above won't happen!
>
> Thanks,
> Xinyu
>

Re: Beam Samza Runner status update

2018-10-10 Thread Hai Lu

Hi, all

This is Hai from LinkedIn. As Xinyu mentioned, I have been working on
portable API for Samza runner and made some solid progress. It's been a
very smooth process (although not effortless for sure) and I'm really
grateful for the great platform that you all have built. I'm very
impressed. Bravo!

Excited to work with everyone on Beam. Do expect more questions from me
down the road.

Thanks,
Hai

On Wed, Oct 10, 2018 at 12:36 PM Kenneth Knowles  wrote:

> Clarification: Thomas Groh wrote the fuser, not me!
>
> Thanks for the sharing all this. Really cool.
>
> Kenn
>
> On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:
>
>> Thanks for sharing! it's so exciting to hear that Beam is being used on
>> Samza in production @LinkedIn! Your feedback will be helpful to Beam
>> community!
>>
>> Besides, Beam supports SQL right now and hopefully Beam community could
>> also receive feedback on BeamSQL
>>  in the future.
>>
>> -Rui
>>
>> On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Thanks for sharing and congrats for this great work !
>>>
>>> Regards
>>> JB
>>> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
>>> target=_blank>xinyuliu...@gmail.com> a écrit:

 Hi, All,

 It's been over four months since we added the Samza Runner to Beam, and
 we've been making a lot of progress after that. Here I would like to update
 your guys and share some really good news happening here at LinkedIn:

 1) First Beam job in production @LInkedIn!
 After a few rounds of testing and benchmarking, we finally rolled out
 our first Beam job here! The job uses quite a few features, such as event
 time, fixed/session windowing, early triggering, and stateful processing.
 Our first customer is very happy and they highly appraise the easy-to-use
 Beam API as well as powerful processing model. Due to the limited resources
 here, we put our full trust in the work you guys are doing, and we didn't
 run into any surprises. We see extremely attention to details as well as
 non-compromise in any user experience everywhere in the code base. We would
 like to thank everyone in the Beam community to contribute to such an
 amazing framework!

 2) A portable Samza Runner prototype
 We are also starting the work in making Samza Runner portable. So far
 we just got the python word count example working using portable Samza
 Runner. Please look out for the PR for this very soon :). Again, this work
 is not possible without the great Beam portability framework, and the
 developers like Luke and Ahmet, just to name a few, behind it. The
 ReferenceRunner has been extremely useful to us to figure out what's needed
 and how it works. Kudos to Thomas Groh, Ben Sidhom and all the others who
 makes this available to us. And to Kenn, your fuse work rocks.

 3) More contributors in Samza Runner
 The runner has been Chris and my personal project for a while and now
 it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
 contribute. Hai has been focusing on the portability work as mentioned in
 #2, and Boris will work mostly on supporting our use cases. We will send
 more emails discussing our use cases, like the "Update state after firing"
 email I sent out earlier.

 Finally, a shout-out to our very own Chris Pettitt. Without you, none
 of the above won't happen!

 Thanks,
 Xinyu

>>>

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread David Morávek

Anton:
All of the points are be correct, with one minor exception. We are
currently moving our production workloads from Euphoria
 to Beam (using the DSL), but we are
hitting scalability issues of the current spark runner, so it is not
technically used in production yet. Everything behaves correctly in the
staging environment, where runner can handle the workload.

Kenn:
here is the the IP Clearance document
https://gist.github.com/dmvk/80acb0579f196e18c02a4e280978d445

Thanks,
David

On Wed, Oct 10, 2018 at 11:30 PM Kenneth Knowles  wrote:

> I just glanced through it to make sure things are in the right place and
> build set up right and that all LGTM.
>
> We need to file the IP Clearance to finish the process that Davor started.
> Please fill the XML template at
> http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/ip-clearance-template.xml
> then I will review and file it in SVN.
>
> Kenn
>
> On Wed, Oct 10, 2018 at 2:15 PM Anton Kedin  wrote:
>
>> I think the code looks good and we should probably just merge it (unless
>> there are other blockers, e.g. formal approvals), considering:
>>  - it has been reviewed;
>>  - it is tested and used in production;
>>  - it was discussed on the list and there were no objections to having it
>> as part of Beam;
>>  - it is a standalone extension and doesn't interfere with Beam Java SDK,
>> if I didn't miss anything;
>>  - it has people working on it and supporting it;
>>
>> All other issues can probably be sorted out in normal Beam process.
>>
>> Regards,
>> Anton
>>
>> On Wed, Oct 10, 2018 at 5:57 AM David Morávek 
>> wrote:
>>
>>> Hello Max,
>>>
>>> It would be great if you can do more of a "general" review, the code
>>> base is fairly large, well tested and it was already reviewed internally by
>>> several people.
>>>
>>> We would like to have the overall approach and design decisions
>>> validated by the community and get some inputs on what could be improved
>>> and if we are headed the right direction.
>>>
>>> Thanks,
>>> David
>>>
>>> On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels 
>>> wrote:
>>>
 That is a huge PR! :) Euphoria looks great. Especially for people
 coming
 from Flink/Spark. I'll check out the documentation.

 Do you have any specific code parts which you want to have reviewed?

 Thanks,
 Max

 On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
 > Hi,
 >
 > Thanks for all the work you are doing on this DSL !
 >
 > I tried to follow the features branch for a while. I'm still committed
 > to  move forward on that front,  but more reviewers would be great.
 >
 > Regards
 > JB
 >
 > On 10/10/2018 10:26, Plajt, Vaclav wrote:
 >> Hello Beam devs,
 >> we finished our main goals in development of Euphoria DSL. It is
 Easy to
 >> use Java 8 API build on top of the Beam's Java SDK. API provides a
 >> high-level abstraction of data transformations, with focus on the
 Java 8
 >> language features (e.g. lambdas and streams). It is fully
 inter-operable
 >> with existing Beam SDK and convertible back and forth. It allows fast
 >> prototyping through use of (optional) Kryo based coders and can be
 >> seamlessly integrated into existing Beam Pipelines.
 >>
 >> Now we believe that it is the time to start discussion about it with
 the
 >> community. Which will hopefully lead to vote about adapting it into
 >> Apache Beam project. Most of main ideas and development goals were
 >> presented in Beam Summit in London [1].
 >>
 >> We are looking for reviewers within the community. Please start with
 >> documentation [2] or design document [3]. Our contribution is
 divided to
 >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
 >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code
 base
 >> remains untouched.
 >> All the checks in MR [5] are passing with exception of "Website
 >> PreCommit". Which seems to be broken, little help here would be
 appreciated.
 >>
 >> Thank you
 >> We are looking forward for your feedback.
 >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
 >>
 >> Resources:
 >> [1] Beam Summit London presentation:
 >>
 https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
 >> [2] Documentation:
 >>
 https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
 >> [3] Design Document: https://s.apache.org/beam-euphoria
 >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
 >> [5] Pull Request: https://github.com/apache/beam/pull/6601
 >> [6] Original proposal:
 >>

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Kenneth Knowles

I just glanced through it to make sure things are in the right place and
build set up right and that all LGTM.

We need to file the IP Clearance to finish the process that Davor started.
Please fill the XML template at
http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/ip-clearance-template.xml
then I will review and file it in SVN.

Kenn

On Wed, Oct 10, 2018 at 2:15 PM Anton Kedin  wrote:

> I think the code looks good and we should probably just merge it (unless
> there are other blockers, e.g. formal approvals), considering:
>  - it has been reviewed;
>  - it is tested and used in production;
>  - it was discussed on the list and there were no objections to having it
> as part of Beam;
>  - it is a standalone extension and doesn't interfere with Beam Java SDK,
> if I didn't miss anything;
>  - it has people working on it and supporting it;
>
> All other issues can probably be sorted out in normal Beam process.
>
> Regards,
> Anton
>
> On Wed, Oct 10, 2018 at 5:57 AM David Morávek 
> wrote:
>
>> Hello Max,
>>
>> It would be great if you can do more of a "general" review, the code base
>> is fairly large, well tested and it was already reviewed internally by
>> several people.
>>
>> We would like to have the overall approach and design decisions validated
>> by the community and get some inputs on what could be improved and if we
>> are headed the right direction.
>>
>> Thanks,
>> David
>>
>> On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels 
>> wrote:
>>
>>> That is a huge PR! :) Euphoria looks great. Especially for people coming
>>> from Flink/Spark. I'll check out the documentation.
>>>
>>> Do you have any specific code parts which you want to have reviewed?
>>>
>>> Thanks,
>>> Max
>>>
>>> On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
>>> > Hi,
>>> >
>>> > Thanks for all the work you are doing on this DSL !
>>> >
>>> > I tried to follow the features branch for a while. I'm still committed
>>> > to  move forward on that front,  but more reviewers would be great.
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 10/10/2018 10:26, Plajt, Vaclav wrote:
>>> >> Hello Beam devs,
>>> >> we finished our main goals in development of Euphoria DSL. It is Easy
>>> to
>>> >> use Java 8 API build on top of the Beam's Java SDK. API provides a
>>> >> high-level abstraction of data transformations, with focus on the
>>> Java 8
>>> >> language features (e.g. lambdas and streams). It is fully
>>> inter-operable
>>> >> with existing Beam SDK and convertible back and forth. It allows fast
>>> >> prototyping through use of (optional) Kryo based coders and can be
>>> >> seamlessly integrated into existing Beam Pipelines.
>>> >>
>>> >> Now we believe that it is the time to start discussion about it with
>>> the
>>> >> community. Which will hopefully lead to vote about adapting it into
>>> >> Apache Beam project. Most of main ideas and development goals were
>>> >> presented in Beam Summit in London [1].
>>> >>
>>> >> We are looking for reviewers within the community. Please start with
>>> >> documentation [2] or design document [3]. Our contribution is divided
>>> to
>>> >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
>>> >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code
>>> base
>>> >> remains untouched.
>>> >> All the checks in MR [5] are passing with exception of "Website
>>> >> PreCommit". Which seems to be broken, little help here would be
>>> appreciated.
>>> >>
>>> >> Thank you
>>> >> We are looking forward for your feedback.
>>> >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
>>> >>
>>> >> Resources:
>>> >> [1] Beam Summit London presentation:
>>> >>
>>> https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
>>> >> [2] Documentation:
>>> >>
>>> https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
>>> >> [3] Design Document: https://s.apache.org/beam-euphoria
>>> >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
>>> >> [5] Pull Request: https://github.com/apache/beam/pull/6601
>>> >> [6] Original proposal:
>>> >>
>>> http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e
>>> >>
>>> >>
>>> >>
>>> >> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
>>> >> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
>>> >> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
>>> >> občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
>>> >> nás a jak vystupuje za společnost a kdo může co a jak podepsat
>>> naleznete
>>> >> zde 
>>> >>
>>> >> You should know that this e-mail and its attachments are confidential.
>>> >> If we are negotiating on the conclusion of a transaction, we reserve
>>> the
>>> >> right to terminate the negotiations at any time. For fans

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Tim Robertson

Thank you JB for starting this discussion.

Others comment on many of these points far better than I can, but my
experience is similar to JB.

1. IDEA integration (and laptop slowing like crazy) being the biggest
contributor to my feeling of being unproductive
2. Not knowing the correct way to modify the build scripts which I put down
to my own limitations

It seems we also need to help build Gradle expertise in our community, so
> that those that are motivated are empowered to contribute.


Nicely phrased. +1



On Wed, Oct 10, 2018 at 7:15 PM Scott Wegner  wrote:

> > Perhaps we should go through and prioritize (and add missing items to)
> BEAM-4045
>
> +1. It's hard to know where to start when there's such a laundry list of
> tasks. If you're having build issues, will you make sure it is represented
> in BEAM-4045, and "Vote" for the issues that you believe are the highest
> priority?
>
> I agree that the Gradle build is far from perfect (my top gripes are IDE
> integration and parallel/incremental build support). I believe that we're
> capable of making our build great, and continuing our investment in Gradle
> would be a shorter path than changing course again. Remember that our Maven
> build also had it's share of issues, which is why we as a community voted
> to replace it [1][2].
>
> It seems we also need to help build Gradle expertise in our community, so
> that those that are motivated are empowered to contribute. Does anybody
> have a good "Getting Started with Gradle" guide they recommend? Perhaps we
> could also link to it from the website/wiki.
>
> [1]
> https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/bd399ecb17cd211be7c6089b562c09ba9116649c9eabe3b609606a3b@%3Cdev.beam.apache.org%3E
>
> On Wed, Oct 10, 2018 at 2:40 AM Robert Bradshaw 
> wrote:
>
>> Some rough stats (because I was curious): The gradle files have been
>> edited by ~79 unique contributors over 696 distinct commits, whereas the
>> maven ones were edited (over a longer time period) by ~130 unique
>> contributors over 1389 commits [1]. This doesn't capture how much effort
>> was put into these edits, but neither is restricted to a small set of
>> experts.
>>
>> Regarding "friendly for other languages" I don't think either is
>> necessarily easy to learn, but my impression is that the maven learning
>> curve shallower for those already firmly embedded in the Java ecosystem
>> (perhaps due to leveraging existing familiarity, and perhaps some due to
>> the implicit java-centric conventions that maven assumed about your
>> project), whereas with gradle at least I could keep pulling on the string
>> to unwind things to the bottom. The "I just want to build/test X without
>> editing/viewing the build files" seemed more natural with Gradle (e.g. I
>> can easily list all tasks).
>>
>> That being said, I don't think everyone needs to understand the full
>> build system. It's important that there be a critical mass that do (we have
>> that for both, and if we can simplify to improve this that'd be great),
>> it's easy enough to do basic changes (e.g. add a dependency, again I don't
>> think the barrier is sufficiently different for either), and works well out
>> of the box for someone who just wants to look up a command on the website
>> and edit code (the CLI is an improvement with Gradle, but it's clear that
>> (java) IDE support is a significant regression).
>>
>> Personally, I don't know much about IDE configuration (admittedly the
>> larger issue), but one action item I can take on is trying to eliminate the
>> need to do a "git clean" after building certain targets (assuming I can
>> reproduce this).
>>
>> Perhaps we should go through and prioritize (and add missing items to)
>> BEAM-4045
>> https://issues.apache.org/jira/issues/?jql=parent%20%3D%20BEAM-4045%20ORDER%20BY%20priority%20DESC
>> ? There's always a long tail with this kind of thing, and looking at the
>> whole list can be daunting, but putting it in the correct order and
>> knocking off the top N items could possibly go a long way.
>>
>> - Robert
>>
>> [1] The commands I ran were (with and without the uniq)
>>
>> $ find . -name 'build.gradle' | xargs git log | grep Author: | grep -o
>> '[^< ]*@' | sort | uniq | wc
>> $ find . -name 'pom.xml' | xargs git log | grep Author: | grep -o '[^<
>> ]*@' | sort | uniq | wc
>>
>> On Wed, Oct 10, 2018 at 10:31 AM Etienne Chauchot 
>> wrote:
>>
>>> Hi all,
>>> I must admit that I agree on the status especially regarding 2 points:
>>> 1. new contributors obstacles: gradle learning curve might be too long
>>> for spare-time contributors, also complex scripted build takes time to
>>> understand comparing to self-descriptive one.
>>> 2. IDE integration kind of slows down development.
>>>
>>> Now, regarding how we improve the situation, I think we need to discuss
>>> and identify tasks and tackle them all together even if

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-10 Thread Anton Kedin

I think the code looks good and we should probably just merge it (unless
there are other blockers, e.g. formal approvals), considering:
 - it has been reviewed;
 - it is tested and used in production;
 - it was discussed on the list and there were no objections to having it
as part of Beam;
 - it is a standalone extension and doesn't interfere with Beam Java SDK,
if I didn't miss anything;
 - it has people working on it and supporting it;

All other issues can probably be sorted out in normal Beam process.

Regards,
Anton

On Wed, Oct 10, 2018 at 5:57 AM David Morávek 
wrote:

> Hello Max,
>
> It would be great if you can do more of a "general" review, the code base
> is fairly large, well tested and it was already reviewed internally by
> several people.
>
> We would like to have the overall approach and design decisions validated
> by the community and get some inputs on what could be improved and if we
> are headed the right direction.
>
> Thanks,
> David
>
> On Wed, Oct 10, 2018 at 2:21 PM Maximilian Michels  wrote:
>
>> That is a huge PR! :) Euphoria looks great. Especially for people coming
>> from Flink/Spark. I'll check out the documentation.
>>
>> Do you have any specific code parts which you want to have reviewed?
>>
>> Thanks,
>> Max
>>
>> On 10.10.18 10:30, Jean-Baptiste Onofré wrote:
>> > Hi,
>> >
>> > Thanks for all the work you are doing on this DSL !
>> >
>> > I tried to follow the features branch for a while. I'm still committed
>> > to  move forward on that front,  but more reviewers would be great.
>> >
>> > Regards
>> > JB
>> >
>> > On 10/10/2018 10:26, Plajt, Vaclav wrote:
>> >> Hello Beam devs,
>> >> we finished our main goals in development of Euphoria DSL. It is Easy
>> to
>> >> use Java 8 API build on top of the Beam's Java SDK. API provides a
>> >> high-level abstraction of data transformations, with focus on the Java
>> 8
>> >> language features (e.g. lambdas and streams). It is fully
>> inter-operable
>> >> with existing Beam SDK and convertible back and forth. It allows fast
>> >> prototyping through use of (optional) Kryo based coders and can be
>> >> seamlessly integrated into existing Beam Pipelines.
>> >>
>> >> Now we believe that it is the time to start discussion about it with
>> the
>> >> community. Which will hopefully lead to vote about adapting it into
>> >> Apache Beam project. Most of main ideas and development goals were
>> >> presented in Beam Summit in London [1].
>> >>
>> >> We are looking for reviewers within the community. Please start with
>> >> documentation [2] or design document [3]. Our contribution is divided
>> to
>> >> two modules: `org.apache.beam:beam-sdks-java-extensions-euphoria` and
>> >> `org.apache.beam:beam-sdks-java-extensions-kryo`. Rest of the code base
>> >> remains untouched.
>> >> All the checks in MR [5] are passing with exception of "Website
>> >> PreCommit". Which seems to be broken, little help here would be
>> appreciated.
>> >>
>> >> Thank you
>> >> We are looking forward for your feedback.
>> >> {david.moravek,vaclav.plajt,marek.simunek}@firma.seznam.cz
>> >>
>> >> Resources:
>> >> [1] Beam Summit London presentation:
>> >>
>> https://docs.google.com/presentation/d/1SagpmzJ-tUQki5VsQOEEEUyi_LXRJdG_3OBLdjBKoh4/edit?usp=sharing
>> >> [2] Documentation:
>> >>
>> https://github.com/seznam/beam/blob/dsl-euphoria/website/src/documentation/sdks/euphoria.md
>> >> [3] Design Document: https://s.apache.org/beam-euphoria
>> >> [4] ASF Jira Issue: https://issues.apache.org/jira/browse/BEAM-3900
>> >> [5] Pull Request: https://github.com/apache/beam/pull/6601
>> >> [6] Original proposal:
>> >>
>> http://mail-archives.apache.org/mod_mbox/beam-dev/201712.mbox/%3ccajjqkhnrp1z8atteogmpfkqxrcjeanb3ykowvvtnwyrvv_-...@mail.gmail.com%3e
>> >>
>> >>
>> >>
>> >> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
>> >> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
>> >> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení
>> >> občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u
>> >> nás a jak vystupuje za společnost a kdo může co a jak podepsat
>> naleznete
>> >> zde 
>> >>
>> >> You should know that this e-mail and its attachments are confidential.
>> >> If we are negotiating on the conclusion of a transaction, we reserve
>> the
>> >> right to terminate the negotiations at any time. For fans of
>> legalese—we
>> >> hereby exclude the provisions of the Civil Code on pre-contractual
>> >> liability. The rules about who and how may act for the company and what
>> >> are the signing procedures can be found here
>> >> .
>> >
>>
>

Re: Fwd: Slack invitation

2018-10-10 Thread Filip Popić

I got it, thank you!

On Wed, 10 Oct 2018 at 16:17, Jean-Baptiste Onofré  wrote:

> You didn't receive it ?
>
> Let me try another time.
>
> Regards
> JB
> Le 10 oct. 2018, à 17:15, "Filip Popić"  a écrit:
>>
>> Any news regarding invitation?
>>
>> On Mon, 8 Oct 2018 at 17:24, Jean-Baptiste Onofré < j...@nanthrax.net>
>> wrote:
>>
>>> Ok I will send it to you as well.
>>>
>>> Regards
>>> JB
>>> Le 8 oct. 2018, à 18:23, Emmanuel Bastien < o...@ebastien.name> a écrit:

 Hello,
 I would like to join the Beam Slack channel. Could someone send me an
 invitation?
 Thanks in advance!
 Emmanuel

Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-10 Thread Ahmet Altay

Given the number of open issues, I will re-cut the release branch once the
blocking issues are resolved. Don't worry about cherry picking changes to
directly to the release branch for now.

I will continue to update this thread.

On Wed, Oct 10, 2018 at 12:12 PM, Niel Markwick  wrote:

> The 3 spannerio issues (5445, 3516, 4796) are waiting for one last LGTM
> before the PRs can be merged, but are otherwise ready for 2.8...

Please work with the reviewers to get them in. I moved those issues to
2.9.0 already.

>
> On Wed, 10 Oct 2018, 19:51 Ahmet Altay,  wrote:
>
>> Thank you JB.
>>
>> It turns out there are 2 more blocker issues. I will look at them now
>> first. (So, I am not rushing towards cutting RC1 yet.)
>>
>> On Wed, Oct 10, 2018 at 11:42 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hey
>>>
>>> Etienne should do a new pass soon. I do my best to cherry pick
>>> RabbitMQIO.
>>>
>>> Thanks
>>> Regards
>>> JB
>>> Le 10 oct. 2018, à 21:25, Ahmet Altay  a écrit:

 Update:

 I started cutting the branch. There are 2 open issues:
 - RabbitMQIO - JB, if you plan to complete this soon I can cherry pick
 to the branch.
 - One new issue related to release process changes with respect to
 beam-site deprecation.

 On Tue, Oct 9, 2018 at 11:38 AM, Jean-Baptiste Onofré 
 wrote:

> Ok. Gonna move forward on RabbitMQIO asap.
>
> Thanks
> Regards
> JB
> Le 9 oct. 2018, à 21:00, Ahmet Altay  a écrit:
>>
>> Hi all,
>>
>> Reminder, I will cut the release branch tomorrow. If you have not
>> done so please take a look at the 2.8.0 issues assigned to you [1].
>>
>> Thank you!
>> Ahmet
>>
>> [1] https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%
>> 20fixVersion%20%3D%202.8.0%20ORDER%20BY%20priority%
>> 20DESC%2C%20updated%20DESC
>>
>> On Thu, Oct 4, 2018 at 9:27 AM, Ahmet Altay  wrote:
>>
>>> Thank you all for the feedback. I will continue with 2.8.0 as a
>>> regular release and separate the LTS discussion to a new thread.
>>>
>>> On Thu, Oct 4, 2018 at 7:58 AM, Thomas Weise  wrote:
>>>
 Given the feedback so far, we should probably decouple LTS and
 2.8.0 discussions. In case both converge before 10/10 then fine, but 
 not
 necessary. I also agree that we should not jump the gun on LTS and 
 minimum
 72 hours feedback window for the topic looks appropriate.

 The issues raised by Tim look like blockers and unless we are
 confident that they can be addressed as a patch release may warrant to
 defer LTS? Can we start to tag such JIRAs with an LTS label?

 On the other hand, I think we could allow for a bit of
 experimentation error for the first LTS attempt and feed
 guidelines/policies from learnings/feedback.

 Dependency updates for LTS: I don't think we should block LTS
 because there is a newer version of a dependency out there or we should
 rush updates. If we prioritize stability, then the latest usually 
 isn't the
 best. In the case of Flink, 1.5.x is probably what most users have at 
 this
 time and it has seen 4 patch releases. If Flink community continues to
 support last two minor (X.Y) versions, then 1.5.x support may drop 
 when 1.7
 comes out, but that does not mean we cannot use it if we were to cut a 
 Beam
 LTS release today. I generally think that LTS needs to focus more on 
 the
 stability of Beam itself.

 Thanks,
 Thomas

 On Thu, Oct 4, 2018 at 6:59 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Regarding LTS release - I agree that we need to have clear view
> what kind of support will be provided for such releases.
>
> Despite of the concerns mentioned before, I have another one about
> API labeled as “@Experimental". I think there are most of IOs, SQL,
> PCollection with Schema, etc, labeled with this annotation.
> According to definition, such API should be considered as unstable
> in terms that it can be changed/removed in next releases.
>
> So, the question is - how “@Experimental” API affects LTS releases
> (if it does)? What kind of support should be provided in this case,
> especially, in case if API continued evolving after LTS has been 
> issued? Do
> we need to provide a guarantee (another annotation, for example) that 
> API
> won’t be changed between two LTS releases?
>
> And one more related question, which probably deserves another
> discussion (or was already discussed) - what is criteria to remove
>

Re: [DISCUSS] Beam public roadmap

2018-10-10 Thread Romain Manni-Bucau

What about a link in the menu. It should contain a list of features and
estimate date with probable error (like "in 5 months +- 1 months) otherwise
it does not bring much IMHO.

Le mer. 10 oct. 2018 23:32, Kenneth Knowles  a écrit :

> Hi all,
>
> We made an attempt at putting together a sort of roadmap [1] in the past
> and also some wide-ranging threads about what could be on it [2]. and I
> think we should pick it up again. The description I really liked was
> "strategic and user impacting initiatives (ongoing and future) in an easy
> to consume format" [3]. It seems that we had feedback asking for a Roadmap
> at the London summit [4].
>
> I would like to first focus on meta-questions rather than what would be on
> it:
>
>  - What style / format should it have to be most useful for users?
>  - Where should it be presented?
>
> I asked a couple people to try to find the roadmap on the web site, as a
> test, and they didn't really know which tab to click on first, so that's a
> starting problem. They didn't even find Works In Progress [5] after
> clicking Contribute. The level of detail of that list varies widely.
>
> I'd also love to see hypothetical formats for it, to see how to balance
> pithiness with crucial details.
>
> Kenn
>
> [1]
> https://lists.apache.org/thread.html/4e1fffa2fde8e750c6d769bf4335853ad05b360b8bd248ad119cc185@%3Cdev.beam.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
> [3]
> https://lists.apache.org/thread.html/60d0333fd9e2c7be2f55e33b0d145f2908e3fe645c008636c86e1133@%3Cdev.beam.apache.org%3E
> [4]
> https://lists.apache.org/thread.html/aa1306da25029dff12a49ba3ce63f2caf6a5f8ba73eda879c8403f3f@%3Cdev.beam.apache.org%3E
>
> [5] https://beam.apache.org/contribute/#works-in-progress
>

[DISCUSS] Beam public roadmap

2018-10-10 Thread Kenneth Knowles

Hi all,

We made an attempt at putting together a sort of roadmap [1] in the past
and also some wide-ranging threads about what could be on it [2]. and I
think we should pick it up again. The description I really liked was
"strategic and user impacting initiatives (ongoing and future) in an easy
to consume format" [3]. It seems that we had feedback asking for a Roadmap
at the London summit [4].

I would like to first focus on meta-questions rather than what would be on
it:

- What style / format should it have to be most useful for users?
- Where should it be presented?

I asked a couple people to try to find the roadmap on the web site, as a
test, and they didn't really know which tab to click on first, so that's a
starting problem. They didn't even find Works In Progress [5] after
clicking Contribute. The level of detail of that list varies widely.

I'd also love to see hypothetical formats for it, to see how to balance
pithiness with crucial details.

Kenn

[1]
https://lists.apache.org/thread.html/4e1fffa2fde8e750c6d769bf4335853ad05b360b8bd248ad119cc185@%3Cdev.beam.apache.org%3E
[2]
https://lists.apache.org/thread.html/f750f288af8dab3f468b869bf5a3f473094f4764db419567f33805d0@%3Cdev.beam.apache.org%3E
[3]
https://lists.apache.org/thread.html/60d0333fd9e2c7be2f55e33b0d145f2908e3fe645c008636c86e1133@%3Cdev.beam.apache.org%3E
[4]
https://lists.apache.org/thread.html/aa1306da25029dff12a49ba3ce63f2caf6a5f8ba73eda879c8403f3f@%3Cdev.beam.apache.org%3E

[5] https://beam.apache.org/contribute/#works-in-progress

Re: Portable Flink runner: Generator source for testing

2018-10-10 Thread Micah Wylde

I've opened a JIRA for adding the generator source (BEAM-5707) and sent out
a very rough PR (https://github.com/apache/beam/pull/6637). Would
appreciate any feedback.

On Mon, Oct 8, 2018 at 9:43 AM, Thomas Weise  wrote:

> The portable runner does not support metrics yet: https://s.apache.org/
> apache-beam-portability-support-table
>
> There is also no JIRA referenced in the table, would be good to
> locate/create it.
>
> On Mon, Oct 8, 2018 at 9:11 AM Łukasz Gajowy 
> wrote:
>
>> Does anyone know what is the status of metrics support for Flink Portable
>> Runner? I think we need them to be used in such tests to at least collect
>> time metric that does not contain cluster warm up time, staging resources
>> time and other things that can disturb the actual run time metric. We
>> probably should use the metrics API in some other places (as described in
>> the above-mentioned proposal).
>>
>>
>>
>> pon., 8 paź 2018 o 12:12 Maximilian Michels  napisał(a):
>>
>>> This is correct. However, the example code is only part of Lyft's code
>>> base. Until timer support is done, we would have to do something similar
>>> in our code base.
>>>
>>> On 08.10.18 02:34, Łukasz Gajowy wrote:
>>> > Hi,
>>> >
>>> > just to clarify, judging from the above snippets: it seems that we are
>>> > able now to run tests that use a native source for data generation and
>>> > use them in this form until the Timers are supported. When Timers are
>>> > there, we should consider switching to the Impulse + PTransform based
>>> > solution (described above) because it's more portable - the current is
>>> > dedicated to Flink only (which still is really cool). Is this correct
>>> or
>>> > am I missing something?
>>> >
>>> > Łukasz
>>> >
>>> > pt., 5 paź 2018 o 14:04 Maximilian Michels >> > > napisał(a):
>>> >
>>> > Thanks for sharing your setup. You're right that we need timers to
>>> > continuously ingest data to the testing pipeline.
>>> >
>>> > Here is the Flink source which generates the data:
>>> > https://github.com/mwylde/beam/commit/
>>> 09c62991773c749bc037cc2b6044896e2d34988a#diff-
>>> b2fc8d680d9c1da86ba23345f3bc83d4R42
>>> >
>>> > On 04.10.18 19:31, Thomas Weise wrote:
>>> >  > FYI here is an example with native generator for portable Flink
>>> > runner:
>>> >  >
>>> >  > https://github.com/mwylde/beam/tree/micah_memory_leak
>>> >  >
>>> > https://github.com/mwylde/beam/blob/22f7099b071e65a76110ecc5beda06
>>> 36ca07e101/sdks/python/apache_beam/examples/streaming_leak.py
>>> >  >
>>> >  > You can use it to run the portable Flink runner in streaming
>>> mode
>>> >  > continuously for testing purposes.
>>> >  >
>>> >  >
>>> >  > On Mon, Oct 1, 2018 at 9:50 AM Thomas Weise >> > 
>>> >  > >> wrote:
>>> >  >
>>> >  >
>>> >  >
>>> >  > On Mon, Oct 1, 2018 at 8:29 AM Maximilian Michels
>>> > mailto:m...@apache.org>
>>> >  > >> wrote:
>>> >  >
>>> >  >  > and then have Flink manage the parallelism for stages
>>> >  > downstream from that?@Pablo Can you clarify what you
>>> mean
>>> > by that?
>>> >  >
>>> >  > Let me paraphrase this just to get a clear
>>> understanding.
>>> > There
>>> >  > are two
>>> >  > approaches to test portable streaming pipelines:
>>> >  >
>>> >  > a) Use an Impulse followed by a test PTransform which
>>> > generates
>>> >  > testing
>>> >  > data. This is similar to how streaming sources work
>>> which
>>> > don't
>>> >  > use the
>>> >  > Read Transform. For basic testing this should work, even
>>> > without
>>> >  > support
>>> >  > for Timers.
>>> >  >
>>> >  >
>>> >  > AFAIK this works for bounded sources and batch mode of the
>>> Flink
>>> >  > runner (staged execution).
>>> >  >
>>> >  > For streaming we need small bundles, we cannot have a Python
>>> > ParDo
>>> >  > block to emit records periodically.
>>> >  >
>>> >  > (With timers, the ParDo wouldn't block but instead schedule
>>> > itself
>>> >  > as needed.)
>>> >  >
>>> >  > b) Introduce a new URN which gets translated to a native
>>> >  > Flink/Spark/xy
>>> >  > testing transform.
>>> >  >
>>> >  > We should go for a) as this will make testing easier
>>> across
>>> >  > portable
>>> >  > runners. We previously discussed native transforms will
>>> be an
>>> >  > option in
>>> >  > Beam, but it would be preferable to leave them out of
>>> testing
>>> >  > for now.
>>> >  >
>>> >  > Thanks,
>>> >  > Max
>>> >  >
>>>

Re: Beam Samza Runner status update

2018-10-10 Thread Kenneth Knowles

Clarification: Thomas Groh wrote the fuser, not me!

Thanks for the sharing all this. Really cool.

Kenn

On Wed, Oct 10, 2018 at 11:17 AM Rui Wang  wrote:

> Thanks for sharing! it's so exciting to hear that Beam is being used on
> Samza in production @LinkedIn! Your feedback will be helpful to Beam
> community!
>
> Besides, Beam supports SQL right now and hopefully Beam community could
> also receive feedback on BeamSQL
>  in the future.
>
> -Rui
>
> On Wed, Oct 10, 2018 at 11:10 AM Jean-Baptiste Onofré 
> wrote:
>
>> Thanks for sharing and congrats for this great work !
>>
>> Regards
>> JB
>> Le 10 oct. 2018, à 20:23, Xinyu Liu @gmail.com
>> target=_blank>xinyuliu...@gmail.com> a écrit:
>>>
>>> Hi, All,
>>>
>>> It's been over four months since we added the Samza Runner to Beam, and
>>> we've been making a lot of progress after that. Here I would like to update
>>> your guys and share some really good news happening here at LinkedIn:
>>>
>>> 1) First Beam job in production @LInkedIn!
>>> After a few rounds of testing and benchmarking, we finally rolled out
>>> our first Beam job here! The job uses quite a few features, such as event
>>> time, fixed/session windowing, early triggering, and stateful processing.
>>> Our first customer is very happy and they highly appraise the easy-to-use
>>> Beam API as well as powerful processing model. Due to the limited resources
>>> here, we put our full trust in the work you guys are doing, and we didn't
>>> run into any surprises. We see extremely attention to details as well as
>>> non-compromise in any user experience everywhere in the code base. We would
>>> like to thank everyone in the Beam community to contribute to such an
>>> amazing framework!
>>>
>>> 2) A portable Samza Runner prototype
>>> We are also starting the work in making Samza Runner portable. So far we
>>> just got the python word count example working using portable Samza Runner.
>>> Please look out for the PR for this very soon :). Again, this work is not
>>> possible without the great Beam portability framework, and the developers
>>> like Luke and Ahmet, just to name a few, behind it. The ReferenceRunner has
>>> been extremely useful to us to figure out what's needed and how it works.
>>> Kudos to Thomas Groh, Ben Sidhom and all the others who makes this
>>> available to us. And to Kenn, your fuse work rocks.
>>>
>>> 3) More contributors in Samza Runner
>>> The runner has been Chris and my personal project for a while and now
>>> it's not the case. We got Hai Lu and Boris Shkolnik from Samza team to
>>> contribute. Hai has been focusing on the portability work as mentioned in
>>> #2, and Boris will work mostly on supporting our use cases. We will send
>>> more emails discussing our use cases, like the "Update state after firing"
>>> email I sent out earlier.
>>>
>>> Finally, a shout-out to our very own Chris Pettitt. Without you, none of
>>> the above won't happen!
>>>
>>> Thanks,
>>> Xinyu
>>>
>>

66 matches

Mail list logo