Re: Apache Beam Newsletter - September 2018

2018-09-12 Thread Etienne Chauchot
Hi,I Just wanted to add some points on subjects I took part in:
On the Elasticsearch IO stand point, in addittion to ES 6 support, there was 
also 
- the addition of an exponential retry backoff on the write for all the ES 
versions.- meta-data support for all the ES
versions.
BestEtienne


Le lundi 10 septembre 2018 à 19:56 -0700, Rose Nguyen a écrit :
> September 2018 | Newsletter
> 
> 
> What’s been done
> 
> CI improvement (by: Etienne Chauchot)
> For each new commit on master Nexmark suite is run in both batch and 
> streaming mode in Spark, Flink, Cloud Dataflow
> (thanks to Andrew) and dashboards graphs are produced to track functional and 
> performance regressions.
> Elasticsearch IO Supports Version 6 (by: Dat Tran)
> Elasticsearch IO now supports version 6.x in addition to version 2.x and 
> 5.x.See the merged PR for more details.
> KuduIO Added (by: Tim Robertson)
> Apache Beam master now has KuduIO that will be released with Beam 2.7.0.See 
> BEAM-2661 for more details. 
> 
> 
> 
> What we’re working on...
> 
> Flink Portable Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise, 
> Ryan Williams)
> Support for streaming side inputs merged  
>  Portable
> Compatibility Matrix tests pass in streaming mode 
>  Many more
> ValidatesRunner tests pass (ValidatesRunner is a comprehensive suite for Beam 
> test pipelines)Python Pipelines can be
> tested without bringing up a JobServer first (it is started in a container)   
>  Experimental support for executing
> the SDK harnesses in a process instead of a Docker containerBug fixes to 
> Beam discovered during working on the
> portabilityState and Timer Support in Python SDK (by: Charles Chen, Robert 
> Bradshaw)
> This change adds the reference DirectRunner implementation of the Python User 
> State and Timers API. With this change,
> a user can execute DoFns with state and timers on the DirectRunner.See the 
> design doc and PR for more details.. 
> New IO - HadoopOutputFormatIO (by: Alexey Romanenko)
> Adding support of MapReduce OutputFormat.See BEAM-5310 for more details. 
> High-level Java 8 DSL (by: David Moravek, Vaclav Plajt, Marek Simunek)
> Adding high-level Java 8 DSL based on Euphoria API projectSee BEAM-3900 for 
> more details. Performance improvements for
> HDFS file writing operations (by: Tim Robertson)
> Autocreate directories when doing an HDFS renameSee PR for more details
> Recognition of non-code contributions (by: Gris Cuevas)
> Got consensus about recognizing non-code contributionsSee discussion for more 
> detailsPlanned launch date: Beam Summit
> London (October 2nd)
> Weekly Community Updates (by: Gris Cuevas)
> Some of the project’s subcomponents run weekly updates in the mailing list, 
> we’ll be consolidating best practices to
> share a weekly community update with all project related must knows in a shell
> 
> 
> 
> What’s planned
> 
> Beam Cookbook (by: Austin Bennett, David Cavazos, Gris Cuevas, Andrea 
> Foegler, Rose Nguyen, Connell O'Callaghan, and
> you!)
> We are creating a cookbook for common data science tasks in Beam and have 
> started brainstormingWe want to have a
> hackathon after the London Summit to generate content from the communityThere 
> will be a session at the summit to
> gather more ideas and input. Watch the dev and users mailing list for a call 
> for contributions soon!.
> Beam 2.7.0 release (by: Charles Chen)
> 
> Beam Mascot (by: Gris Cuevas & Community!)
> We got approval to launch a contest to create a new Apache Beam mascotSee 
> discussion for more details, if you’re
> interested in driving this, reach out in the thread!Planned launch date: Last 
> week of September
> 
> 
> 
> New Members
> 
> New Contributors
>  Đạt Trần, Ho Chi Minh City, VietnamSee BEAM-5107 for more details on 
> “Support ES-6.x for ElasticsearchIO” Ravi
> Pathak, Copenhagen, DenmarkUsing Beam for indexing open data on species at 
> GBIF.orgImproving robustness of SolrIO
> New Committers
> Tim Robertson, Copenhagen, Denmark
> 
> 
> 
> Events, Talks & Meetups
> 
> [Coming Up] Beam Summit @ London, England
> Organized by: Matthias Baetens, Victor Kotai, Alex Van Boxel & Gris CuevasThe 
> Beam Summit London 2018 will take place
> on October 1 and 2 in London. If you’re interested in speaking reach out to 
> gris@apache.orgMore info can be found in
> the blog post and you can get your tickets on Eventbrite
> [Coming Up] ApacheCon @ Montréal, Canada
> Will take place Sep 24-27 Etienne Chauchot will give a talk on Universal 
> Metrics with BeamAlexey Romanenko and Ismaël
> Mejía will give a talk on Building portable and evolvable data-intensive 
> applications with ApacheIsmaël Mejía and
> Eugene Kirpichov will give a talk on Robust, performant and modular APIs for 
> data ingestion with Apache BeamGris
> Cuevas will host a Birds of a Feather session on 9/26: Design Thinking to 
> manage online communities in 

Re: Apache Beam Newsletter - September 2018

2018-09-10 Thread Griselda Cuevas
Thank you so much for helping us put this together Rose!

And thanks everyone who contributed, is so good to see how our
contributions grow and how much more events we're representing Beam at!




On Mon, 10 Sep 2018 at 19:56, Rose Nguyen  wrote:

>
> [image: Beam.png]
>
> September 2018 | Newsletter
>
>
> What’s been done
>
>
> CI improvement (by: Etienne Chauchot)
>
>-
>
>For each new commit on master Nexmark suite is run in both batch and
>streaming mode in Spark, Flink, Cloud Dataflow (thanks to Andrew) and
>dashboards graphs are produced to track functional and performance
>regressions.
>
>
> Elasticsearch IO Supports Version 6 (by: Dat Tran)
>
>-
>
>Elasticsearch IO now supports version 6.x in addition to version 2.x
>and 5.x.
>-
>
>See the merged PR
>
>for more details.
>
>
> KuduIO Added (by: Tim Robertson)
>
>-
>
>Apache Beam master now has KuduIO that will be released with Beam
>2.7.0.
>-
>
>See BEAM-2661  for
>more details.
>
>
>
> What we’re working on...
>
>
> Flink Portable Runner (by: Ankur Goenka, Maximilian Michels, Thomas
> Weise, Ryan Williams)
>
>-
>
>Support for streaming side inputs merged
>
>-
>
>Portable Compatibility Matrix tests pass in streaming mode
>
>-
>
>Many more ValidatesRunner tests pass (ValidatesRunner is a
>comprehensive suite for Beam test pipelines)
>-
>
>Python Pipelines can be tested without bringing up a JobServer first
>(it is started in a container)
>-
>
>Experimental support for executing the SDK harnesses in a process
>instead of a Docker container
>-
>
>Bug fixes to Beam discovered during working on the portability
>
> State and Timer Support in Python SDK (by: Charles Chen, Robert Bradshaw)
>
>-
>
>This change adds the reference DirectRunner implementation of the
>Python User State and Timers API. With this change, a user can execute
>DoFns with state and timers on the DirectRunner.
>-
>
>See the design doc
> and PR
> for more details..
>
>
> New IO - HadoopOutputFormatIO (by: Alexey Romanenko)
>
>-
>
>Adding support of MapReduce OutputFormat.
>-
>
>See BEAM-5310  for
>more details.
>
>
> High-level Java 8 DSL (by: David Moravek, Vaclav Plajt, Marek Simunek)
>
>-
>
>Adding high-level Java 8 DSL based on Euphoria API
> project
>-
>
>See BEAM-3900  for
>more details.
>
> Performance improvements for HDFS file writing operations (by: Tim
> Robertson)
>
>-
>
>Autocreate directories when doing an HDFS rename
>-
>
>See PR  for more details
>
>
> Recognition of non-code contributions (by: Gris Cuevas)
>
>-
>
>Got consensus about recognizing non-code contributions
>-
>
>See
>
> 
>discussion for more details
>-
>
>Planned launch date: Beam Summit London (October 2nd)
>
>
> Weekly Community Updates (by: Gris Cuevas)
>
>-
>
>Some of the project’s subcomponents run weekly updates in the mailing
>list, we’ll be consolidating best practices to share a weekly community
>update with all project related must knows in a shell
>
>
>
> What’s planned
>
>
> Beam Cookbook (by: Austin Bennett, David Cavazos, Gris Cuevas, Andrea
> Foegler, Rose Nguyen, Connell O'Callaghan, and you!)
>
>-
>
>We are creating a cookbook for common data science tasks in Beam and
>have started brainstorming
>-
>
>We want to have a hackathon after the London Summit to generate
>content from the community
>-
>
>There will be a session at the summit to gather more ideas and input.
>Watch the dev and users mailing list for a call for contributions soon!.
>
>
> Beam 2.7.0 release (by: Charles Chen)
>
> Beam Mascot (by: Gris Cuevas & Community!)
>
>-
>
>We got approval to launch a contest to create a new Apache Beam mascot
>-
>
>See
>
>discussion for more details, if you’re interested in driving this, reach
>out in the thread!
>-
>
>Planned launch date: Last week of September
>
>
>
> New Members
>
>
> New Contributors
>
>-
>
>Đạt Trần, Ho Chi Minh City, Vietnam
>-
>
>   See BEAM-5107
>   
>   for more details on “Support ES-6.x for ElasticsearchIO”
>   -
>
>Ravi Pathak, Copenhagen, Denmark
>-
>
>   Using Beam for indexing 

Apache Beam Newsletter - September 2018

2018-09-10 Thread Rose Nguyen
[image: Beam.png]

September 2018 | Newsletter


What’s been done


CI improvement (by: Etienne Chauchot)

   -

   For each new commit on master Nexmark suite is run in both batch and
   streaming mode in Spark, Flink, Cloud Dataflow (thanks to Andrew) and
   dashboards graphs are produced to track functional and performance
   regressions.


Elasticsearch IO Supports Version 6 (by: Dat Tran)

   -

   Elasticsearch IO now supports version 6.x in addition to version 2.x and
   5.x.
   -

   See the merged PR
   
   for more details.


KuduIO Added (by: Tim Robertson)

   -

   Apache Beam master now has KuduIO that will be released with Beam 2.7.0.
   -

   See BEAM-2661  for more
   details.



What we’re working on...


Flink Portable Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise,
Ryan Williams)

   -

   Support for streaming side inputs merged

   -

   Portable Compatibility Matrix tests pass in streaming mode

   -

   Many more ValidatesRunner tests pass (ValidatesRunner is a comprehensive
   suite for Beam test pipelines)
   -

   Python Pipelines can be tested without bringing up a JobServer first (it
   is started in a container)
   -

   Experimental support for executing the SDK harnesses in a process
   instead of a Docker container
   -

   Bug fixes to Beam discovered during working on the portability

State and Timer Support in Python SDK (by: Charles Chen, Robert Bradshaw)

   -

   This change adds the reference DirectRunner implementation of the Python
   User State and Timers API. With this change, a user can execute DoFns with
   state and timers on the DirectRunner.
   -

   See the design doc
    and PR
    for more details..


New IO - HadoopOutputFormatIO (by: Alexey Romanenko)

   -

   Adding support of MapReduce OutputFormat.
   -

   See BEAM-5310  for more
   details.


High-level Java 8 DSL (by: David Moravek, Vaclav Plajt, Marek Simunek)

   -

   Adding high-level Java 8 DSL based on Euphoria API
    project
   -

   See BEAM-3900  for more
   details.

Performance improvements for HDFS file writing operations (by: Tim
Robertson)

   -

   Autocreate directories when doing an HDFS rename
   -

   See PR  for more details


Recognition of non-code contributions (by: Gris Cuevas)

   -

   Got consensus about recognizing non-code contributions
   -

   See
   

   discussion for more details
   -

   Planned launch date: Beam Summit London (October 2nd)


Weekly Community Updates (by: Gris Cuevas)

   -

   Some of the project’s subcomponents run weekly updates in the mailing
   list, we’ll be consolidating best practices to share a weekly community
   update with all project related must knows in a shell



What’s planned


Beam Cookbook (by: Austin Bennett, David Cavazos, Gris Cuevas, Andrea
Foegler, Rose Nguyen, Connell O'Callaghan, and you!)

   -

   We are creating a cookbook for common data science tasks in Beam and
   have started brainstorming
   -

   We want to have a hackathon after the London Summit to generate content
   from the community
   -

   There will be a session at the summit to gather more ideas and input.
   Watch the dev and users mailing list for a call for contributions soon!.


Beam 2.7.0 release (by: Charles Chen)

Beam Mascot (by: Gris Cuevas & Community!)

   -

   We got approval to launch a contest to create a new Apache Beam mascot
   -

   See
   
   discussion for more details, if you’re interested in driving this, reach
   out in the thread!
   -

   Planned launch date: Last week of September



New Members


New Contributors

   -

   Đạt Trần, Ho Chi Minh City, Vietnam
   -

  See BEAM-5107
  
  for more details on “Support ES-6.x for ElasticsearchIO”
  -

   Ravi Pathak, Copenhagen, Denmark
   -

  Using Beam for indexing open data on species at GBIF.org
  -

  Improving robustness of SolrIO


New Committers

   -

   Tim Robertson, Copenhagen, Denmark



Events, Talks & Meetups


[Coming Up] Beam Summit @ London, England

   -

   Organized by: Matthias Baetens, Victor Kotai, Alex Van Boxel & Gris
   Cuevas
   -

   The Beam Summit London 2018 will take place on October 1 and 2 in London.

   -

   If you’re interested in speaking reach out to g...@apache.org
   -

   More info can be found in the blog post
    and