Re: [ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-13 Thread Connell O'Callaghan
Excellent thank you Chamikara and all who contributed to this release!!!

On Thu, Dec 13, 2018 at 7:42 PM Chamikara Jayalath 
wrote:

> The Apache Beam team is pleased to announce the release of version 2.9.0!
>
> Apache Beam is an open source unified programming model to define and
> execute data processing pipelines, including ETL, batch and stream
> (continuous) processing. See https://beam.apache.org
>
> You can download the release here:
>
> https://beam.apache.org/get-started/downloads/
>
> This release includes the following major new features & improvements.
> Please see the blog post for more details:
> https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html
>
> Thanks to everyone who contributed to this release, and we hope you enjoy
> using Beam 2.9.0.
> -- Chamikara Jayalath, on behalf of The Apache Beam team
>


[ANNOUNCE] Apache Beam 2.9.0 released!

2018-12-13 Thread Chamikara Jayalath
The Apache Beam team is pleased to announce the release of version 2.9.0!

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes the following major new features & improvements.
Please see the blog post for more details:
https://beam.apache.org/blog/2018/12/13/beam-2.9.0.html

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.9.0.
-- Chamikara Jayalath, on behalf of The Apache Beam team


Re: [ANNOUNCEMENT] [SQL] [BEAM-6133] Support for user defined table functions (UDTF)

2018-12-13 Thread Kenneth Knowles
Sorry for the slow reply & review. Having UDTF support in Beam SQL is
extremely useful. Are both table functions and table macros part of
"standard" SQL or is this a distinction between different Calcite concepts?

Kenn

On Wed, Nov 28, 2018 at 10:36 AM Gleb Kanterov  wrote:

> At the moment we support only ScalarFunction UDF, it's functions that
> operate on row fields. In Calcite, there are 3 kinds of UDFs: aggregate
> functions (that we already support), table macro and table functions. The
> difference between table functions and macros is that macros expand to
> relations, and table functions can refer to anything queryable, e.g.,
> enumerables. But in the case of Beam SQL, given everything translates to
> PTransforms, only table macros are relevant.
>
> UDTF are in a way similar to external tables but don't require to specify
> a schema explicitly. Instead, they can derive schema based on arguments.
> One of the use-cases would be querying ranges of dataset partitions using a
> helper function like:
>
> SELECT COUNT(*) FROM table(readAvro(id => 'dataset', start =>
> '2017-01-01', end => '2018-01-01'))
>
> With BEAM-6133  (
> apache/beam/#7141 ) we would
> have support for UDTF in Beam SQL.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6133
> [2] https://github.com/apache/beam/pull/7141
>
> Gleb
>


Re: SplittableDoFn for zipWithIndex for a large file

2018-12-13 Thread Scott Wegner
I previously responded to your post on user@:
https://lists.apache.org/thread.html/5c10b7edf982ef63d1d1d70545e3fe2716d00628ff5c2a7854383413@%3Cuser.beam.apache.org%3E

I've also mirrored my response on StackOverflow:
https://stackoverflow.com/a/53771980/33791

On Thu, Dec 13, 2018 at 4:21 PM Chak-Pong Chung  wrote:

> Hello everyone!
>
> I asked the following question and think I might get some suggestions
> whether what I want is doable or not.
>
>
> https://stackoverflow.com/questions/53746046/how-can-i-implement-zipwithindex-like-spark-in-apache-beam/53747612#53747612
>
> If I can get `PCollection` id and the number of (contiguous)lines in each
> `PCollection`, then I can calculate the row order within each
> partition/`PCollection`  first and then do prefix-sum to compute the offset
> for each partition. This is doable in MPI or openMP since I can get the
> id/rank of each processor/thread.
>
> Best,
> Chak-Pong
>


-- 




Got feedback? tinyurl.com/swegner-feedback


SplittableDoFn for zipWithIndex for a large file

2018-12-13 Thread Chak-Pong Chung
Hello everyone!

I asked the following question and think I might get some suggestions
whether what I want is doable or not.

https://stackoverflow.com/questions/53746046/how-can-i-implement-zipwithindex-like-spark-in-apache-beam/53747612#53747612

If I can get `PCollection` id and the number of (contiguous)lines in each
`PCollection`, then I can calculate the row order within each
partition/`PCollection`  first and then do prefix-sum to compute the offset
for each partition. This is doable in MPI or openMP since I can get the
id/rank of each processor/thread.

Best,
Chak-Pong


Apache Beam Newsletter - December 2018

2018-12-13 Thread Rose Nguyen
[image: Beam.png]

December 2018 | Newsletter

 What’s been done

--

KafkaIO (by: Alexey Romanenko, Raghu Angadi)

   -

   Added writing support with ProducerRecord
   -

   See BEAM-6063  for more
   details


HadoopFormatIO (by: Alexey Romanenko, David Moravek, David Hrbacek)

   -

   Added writing support (batching/streaming) for HadoopFormatIO.
   -

   See BEAM-5310  for more
   details


Flink Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise, Ryan
Williams, Robert Bradshaw)

   -

   Portability: Integration of timers and state in user functions for
   streaming and batch execution
   -

   Portability: Combiner lifting (performance optimization)
   -

   Portability: More easily readable operator names
   -

   Support for Beam in Flink 1.5 and 1.6
   -

   Support for scaling up Beam applications on Flink
   -

   Bug fixes and improved testing


TableMacroUDF (by: Gleb Kanterov)

   -

   Support for user defined table functions (UDTF).
   -

   See BEAM-6133 for more details.



 What we’re working on...

--

Flink Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise, Ryan
Williams, Robert Bradshaw)

   -

   Portability: Integration of metrics
   -

   Better support for Flink savepoints
   -

   Performance tuning of portable pipelines
   -

   Portability: Integration with Kubernetes
   -

   Improving the testing infrastructure on Jenkins


Load tests of Core Apache Beam Operations (by: Łukasz Gajowy, Katarzyna
Kucharczyk)

   -

   Test operations such as GroupByKey, ParDo, Combine etc in stressful
   conditions.
   -

   See https://s.apache.org/load-test-basic-operations for more details on
how it works.



 New committers
--



   -

   Matthias Baetens, London, England



 Talks & meetups
--

DevFest 2018 @ Warsaw

   -

   “Apache Beam - what do I Gain?” with speaker Łukasz Gajowy
   -

   Link to description here
    and slides
   here
   


Bay Area Apache Beam Meetup @ San Francisco

   -

   Kicking off the first Bay Area Apache Beam Meetup in SF, December 12th.
   -

   Speakers are Andrew Pilloud and Kenn Knowles
   -

   To be recorded and posted here.


Open Source Summit @ Paris

   -

   “Apache Beam: portable and evolutive data-intensive applications”,
   December 5th.
   -

   Speaker: Alexey Romanenko
   -

   Slides: link
   


DevFest 2018 @ London / Bucharest

   -

   “Large scale stream analytics with Apache Beam” with speaker Matthias
   Baetens
   -

   Slides: here
   



Until Next Time!

*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in December 2018 and ongoing efforts. We hope
to provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread.*--
Rose Thị Nguyễn


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-13 Thread Chamikara Jayalath
Thanks all for voting.

This vote has passed with 9 +1 votes (4 binding) and no -1 votes.
I'll complete the remaining work and finalize the release.

Thanks,
Cham

On Thu, Dec 13, 2018 at 11:12 AM Jean-Baptiste Onofré 
wrote:

> +1 (binding)
>
> Regards
> JB
> Le 13 déc. 2018, à 20:11, Reuven Lax  a écrit:
>>
>> +1 (binding)
>>
>> On Thu, Dec 13, 2018 at 8:39 AM Kenneth Knowles < k...@apache.org>
>> wrote:
>>
>>> +1 (binding)
>>>
>>> A new feature request ( https://issues.apache.org/jira/browse/BEAM-6212)
>>> had been filed against 2.9.0 release (
>>> https://issues.apache.org/jira/projects/BEAM/versions/12344258). I
>>> moved it to 2.10.0.
>>>
>>> I additionally built [some targets in] the source release. The website
>>> build does not work since it apparently depends on having a git repo
>>> defined. We should fix that but no reason to block the release.
>>>
>>> Kenn
>>>
>>> On Wed, Dec 12, 2018 at 4:54 PM Andrew Pilloud < apill...@google.com>
>>> wrote:
>>>
 +1

 Turns out we broke DOUBLE on purpose. Updated the demo to use DECIMAL
 and it doesn't hard fail. This is a docs bug.

 On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner < sc...@apache.org>
 wrote:

> +1
>
> I verified the Java examples succeed on DirectRunner.
>
> On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Thanks Andrew. Please make this a blocker and -1 the thread if you
>> think we need a new RC.
>>
>> - Cham
>>
>> On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud < apill...@google.com>
>> wrote:
>>
>>> I was just running the Beam SQL demo. I found one query fails with
>>> "the keyCoder of a GroupByKey must be deterministic" and another just
>>> hangs. I opened an issue:
>>> https://issues.apache.org/jira/browse/BEAM-6224 Not sure if this
>>> calls for canceling the release or just a release note (SQL is still
>>> experimental). I'm continuing to track down the root cause, but am tied 
>>> up
>>> with the Beam Meetup in SFO today.
>>>
>>> Andrew
>>>
>>> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang < ruo...@google.com>
>>> wrote:
>>>
 +1,  Looking forward to the release!

 On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Hi All,
>
> I ran Beam RC verification script [1] and updated the validation
> spreadsheet [2]. I think the current release candidate looks good.
>
> So +1 for the release.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
> [2]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>
> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía < ieme...@gmail.com>
> wrote:
>
>> Looking at the dates on the Spark runner git log there was a PR
>> merged to change Spark translation from classes to URNs. I cannot 
>> see how
>> this can impact performance. Looking at the other queries in the
>> dashboards, there seems to be a great variability in the executions 
>> of the
>> Spark runner to the point of feeling we don't have guarantees 
>> anymore. I
>> wonder if this was because of other loads shared in the server(s), or
>> because our sample is too small for the standard deviation.
>>
>> I would proceed with the release, the real question is if we can
>> somehow constraint the execution of this tests to have a more 
>> consistent
>> output.
>>
>>
>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot <
>> echauc...@apache.org> wrote:
>>
>>> Hi all,
>>> Regarding query7 in spark:
>>> - there doesn't seem to be a functional regression: query passes
>>> and output size is still the same
>>>
>>> - Also the performance degradation seems to be only on spark,
>>> the other runners do not seem to suffer from it.
>>>
>>> - performance degradation seems to be constant from 11/12 so we
>>> can eliminate temporary load on the jenkins server that would 
>>> generate
>>> delays in Max transform.
>>>
>>> => query7 uses Max transform, fanout and side inputs, has one of
>>> these parts recently (11/12/18) changed in spark?
>>>
>>> Etienne
>>>
>>> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a
>>> écrit :
>>>
>>> Udi or anybody else who is familiar about Nexmark,  please -1
>>> the vote thread if you think this particular performance regression 
>>> for
>>> Spark/Direct runners 

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-13 Thread Jean-Baptiste Onofré
+1 (binding)

Regards
JB

Le 13 déc. 2018 à 20:11, à 20:11, Reuven Lax  a écrit:
>+1 (binding)
>
>On Thu, Dec 13, 2018 at 8:39 AM Kenneth Knowles 
>wrote:
>
>> +1 (binding)
>>
>> A new feature request
>(https://issues.apache.org/jira/browse/BEAM-6212)
>> had been filed against 2.9.0 release (
>> https://issues.apache.org/jira/projects/BEAM/versions/12344258). I
>moved
>> it to 2.10.0.
>>
>> I additionally built [some targets in] the source release. The
>website
>> build does not work since it apparently depends on having a git repo
>> defined. We should fix that but no reason to block the release.
>>
>> Kenn
>>
>> On Wed, Dec 12, 2018 at 4:54 PM Andrew Pilloud 
>> wrote:
>>
>>> +1
>>>
>>> Turns out we broke DOUBLE on purpose. Updated the demo to use
>DECIMAL and
>>> it doesn't hard fail. This is a docs bug.
>>>
>>> On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner 
>wrote:
>>>
 +1

 I verified the Java examples succeed on DirectRunner.

 On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath
>
 wrote:

> Thanks Andrew. Please make this a blocker and -1 the thread if you
> think we need a new RC.
>
> - Cham
>
> On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud
>
> wrote:
>
>> I was just running the Beam SQL demo. I found one query fails
>with
>> "the keyCoder of a GroupByKey must be deterministic" and another
>just
>> hangs. I opened an issue:
>> https://issues.apache.org/jira/browse/BEAM-6224 Not sure if this
>> calls for canceling the release or just a release note (SQL is
>still
>> experimental). I'm continuing to track down the root cause, but
>am tied up
>> with the Beam Meetup in SFO today.
>>
>> Andrew
>>
>> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang 
>> wrote:
>>
>>> +1,  Looking forward to the release!
>>>
>>> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Hi All,

 I ran Beam RC verification script [1] and updated the
>validation
 spreadsheet [2]. I think the current release candidate looks
>good.

 So +1 for the release.

 Thanks,
 Cham

 [1]

>https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
 [2]

>https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529

 On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía 
 wrote:

> Looking at the dates on the Spark runner git log there was a
>PR
> merged to change Spark translation from classes to URNs. I
>cannot see how
> this can impact performance. Looking at the other queries in
>the
> dashboards, there seems to be a great variability in the
>executions of the
> Spark runner to the point of feeling we don't have guarantees
>anymore. I
> wonder if this was because of other loads shared in the
>server(s), or
> because our sample is too small for the standard deviation.
>
> I would proceed with the release, the real question is if we
>can
> somehow constraint the execution of this tests to have a more
>consistent
> output.
>
>
> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot <
> echauc...@apache.org> wrote:
>
>> Hi all,
>> Regarding query7 in spark:
>> - there doesn't seem to be a functional regression: query
>passes
>> and output size is still the same
>>
>> - Also the performance degradation seems to be only on spark,
>the
>> other runners do not seem to suffer from it.
>>
>> - performance degradation seems to be constant from 11/12 so
>we
>> can eliminate temporary load on the jenkins server that would
>generate
>> delays in Max transform.
>>
>> => query7 uses Max transform, fanout and side inputs, has one
>of
>> these parts recently (11/12/18) changed in spark?
>>
>> Etienne
>>
>> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a
>> écrit :
>>
>> Udi or anybody else who is familiar about Nexmark,  please -1
>the
>> vote thread if you think this particular performance
>regression for
>> Spark/Direct runners is a blocker. Otherwise I think we can
>continue the
>> vote.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>> Are either of these regressions due to known issues ? If not
>> should they be considered release blockers ?
>>
>> Thanks,
>> Cham
>>
>> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri 
>wrote:
>>
>> For DirectRunner there are regressions in 

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-13 Thread Reuven Lax
+1 (binding)

On Thu, Dec 13, 2018 at 8:39 AM Kenneth Knowles  wrote:

> +1 (binding)
>
> A new feature request (https://issues.apache.org/jira/browse/BEAM-6212)
> had been filed against 2.9.0 release (
> https://issues.apache.org/jira/projects/BEAM/versions/12344258). I moved
> it to 2.10.0.
>
> I additionally built [some targets in] the source release. The website
> build does not work since it apparently depends on having a git repo
> defined. We should fix that but no reason to block the release.
>
> Kenn
>
> On Wed, Dec 12, 2018 at 4:54 PM Andrew Pilloud 
> wrote:
>
>> +1
>>
>> Turns out we broke DOUBLE on purpose. Updated the demo to use DECIMAL and
>> it doesn't hard fail. This is a docs bug.
>>
>> On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner  wrote:
>>
>>> +1
>>>
>>> I verified the Java examples succeed on DirectRunner.
>>>
>>> On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath 
>>> wrote:
>>>
 Thanks Andrew. Please make this a blocker and -1 the thread if you
 think we need a new RC.

 - Cham

 On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud 
 wrote:

> I was just running the Beam SQL demo. I found one query fails with
> "the keyCoder of a GroupByKey must be deterministic" and another just
> hangs. I opened an issue:
> https://issues.apache.org/jira/browse/BEAM-6224 Not sure if this
> calls for canceling the release or just a release note (SQL is still
> experimental). I'm continuing to track down the root cause, but am tied up
> with the Beam Meetup in SFO today.
>
> Andrew
>
> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang 
> wrote:
>
>> +1,  Looking forward to the release!
>>
>> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Hi All,
>>>
>>> I ran Beam RC verification script [1] and updated the validation
>>> spreadsheet [2]. I think the current release candidate looks good.
>>>
>>> So +1 for the release.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>>> [2]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>>
>>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía 
>>> wrote:
>>>
 Looking at the dates on the Spark runner git log there was a PR
 merged to change Spark translation from classes to URNs. I cannot see 
 how
 this can impact performance. Looking at the other queries in the
 dashboards, there seems to be a great variability in the executions of 
 the
 Spark runner to the point of feeling we don't have guarantees anymore. 
 I
 wonder if this was because of other loads shared in the server(s), or
 because our sample is too small for the standard deviation.

 I would proceed with the release, the real question is if we can
 somehow constraint the execution of this tests to have a more 
 consistent
 output.


 On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot <
 echauc...@apache.org> wrote:

> Hi all,
> Regarding query7 in spark:
> - there doesn't seem to be a functional regression: query passes
> and output size is still the same
>
> - Also the performance degradation seems to be only on spark, the
> other runners do not seem to suffer from it.
>
> - performance degradation seems to be constant from 11/12 so we
> can eliminate temporary load on the jenkins server that would generate
> delays in Max transform.
>
> => query7 uses Max transform, fanout and side inputs, has one of
> these parts recently (11/12/18) changed in spark?
>
> Etienne
>
> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a
> écrit :
>
> Udi or anybody else who is familiar about Nexmark,  please -1 the
> vote thread if you think this particular performance regression for
> Spark/Direct runners is a blocker. Otherwise I think we can continue 
> the
> vote.
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
> Are either of these regressions due to known issues ? If not
> should they be considered release blockers ?
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>
> For DirectRunner there are regressions in query 7 sql direct
> runner batch mode
> 
>  (2x)
> and streaming mode 

Re: Contributor for Beam Jira tickets

2018-12-13 Thread Kenneth Knowles
Done. Welcome!

Kenn

On Thu, Dec 13, 2018 at 10:54 AM ter...@teresato.com 
wrote:

> Hi, I've been using Beam and Dataflow for one of my work projects, and
> would like to be granted contributor status for Beam JIRA tickets. My user
> id is teresato.
>
> Thanks,
> Teresa
>


Contributor for Beam Jira tickets

2018-12-13 Thread teresa
Hi, I've been using Beam and Dataflow for one of my work projects, and would 
like to be granted contributor status for Beam JIRA tickets. My user id is 
teresato.

Thanks, 
Teresa


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-13 Thread Scott Wegner
I've opened BEAM-6228 for the website build issue-- thanks for noting it
Kenn.

On Thu, Dec 13, 2018 at 8:39 AM Kenneth Knowles  wrote:

> +1 (binding)
>
> A new feature request (https://issues.apache.org/jira/browse/BEAM-6212)
> had been filed against 2.9.0 release (
> https://issues.apache.org/jira/projects/BEAM/versions/12344258). I moved
> it to 2.10.0.
>
> I additionally built [some targets in] the source release. The website
> build does not work since it apparently depends on having a git repo
> defined. We should fix that but no reason to block the release.
>
> Kenn
>
> On Wed, Dec 12, 2018 at 4:54 PM Andrew Pilloud 
> wrote:
>
>> +1
>>
>> Turns out we broke DOUBLE on purpose. Updated the demo to use DECIMAL and
>> it doesn't hard fail. This is a docs bug.
>>
>> On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner  wrote:
>>
>>> +1
>>>
>>> I verified the Java examples succeed on DirectRunner.
>>>
>>> On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath 
>>> wrote:
>>>
 Thanks Andrew. Please make this a blocker and -1 the thread if you
 think we need a new RC.

 - Cham

 On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud 
 wrote:

> I was just running the Beam SQL demo. I found one query fails with
> "the keyCoder of a GroupByKey must be deterministic" and another just
> hangs. I opened an issue:
> https://issues.apache.org/jira/browse/BEAM-6224 Not sure if this
> calls for canceling the release or just a release note (SQL is still
> experimental). I'm continuing to track down the root cause, but am tied up
> with the Beam Meetup in SFO today.
>
> Andrew
>
> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang 
> wrote:
>
>> +1,  Looking forward to the release!
>>
>> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Hi All,
>>>
>>> I ran Beam RC verification script [1] and updated the validation
>>> spreadsheet [2]. I think the current release candidate looks good.
>>>
>>> So +1 for the release.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>>> [2]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>>
>>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía 
>>> wrote:
>>>
 Looking at the dates on the Spark runner git log there was a PR
 merged to change Spark translation from classes to URNs. I cannot see 
 how
 this can impact performance. Looking at the other queries in the
 dashboards, there seems to be a great variability in the executions of 
 the
 Spark runner to the point of feeling we don't have guarantees anymore. 
 I
 wonder if this was because of other loads shared in the server(s), or
 because our sample is too small for the standard deviation.

 I would proceed with the release, the real question is if we can
 somehow constraint the execution of this tests to have a more 
 consistent
 output.


 On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot <
 echauc...@apache.org> wrote:

> Hi all,
> Regarding query7 in spark:
> - there doesn't seem to be a functional regression: query passes
> and output size is still the same
>
> - Also the performance degradation seems to be only on spark, the
> other runners do not seem to suffer from it.
>
> - performance degradation seems to be constant from 11/12 so we
> can eliminate temporary load on the jenkins server that would generate
> delays in Max transform.
>
> => query7 uses Max transform, fanout and side inputs, has one of
> these parts recently (11/12/18) changed in spark?
>
> Etienne
>
> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a
> écrit :
>
> Udi or anybody else who is familiar about Nexmark,  please -1 the
> vote thread if you think this particular performance regression for
> Spark/Direct runners is a blocker. Otherwise I think we can continue 
> the
> vote.
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
> Are either of these regressions due to known issues ? If not
> should they be considered release blockers ?
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>
> For DirectRunner there are regressions in query 7 sql direct
> runner batch mode
> 

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-13 Thread Kenneth Knowles
+1 (binding)

A new feature request (https://issues.apache.org/jira/browse/BEAM-6212) had
been filed against 2.9.0 release (
https://issues.apache.org/jira/projects/BEAM/versions/12344258). I moved it
to 2.10.0.

I additionally built [some targets in] the source release. The website
build does not work since it apparently depends on having a git repo
defined. We should fix that but no reason to block the release.

Kenn

On Wed, Dec 12, 2018 at 4:54 PM Andrew Pilloud  wrote:

> +1
>
> Turns out we broke DOUBLE on purpose. Updated the demo to use DECIMAL and
> it doesn't hard fail. This is a docs bug.
>
> On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner  wrote:
>
>> +1
>>
>> I verified the Java examples succeed on DirectRunner.
>>
>> On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath 
>> wrote:
>>
>>> Thanks Andrew. Please make this a blocker and -1 the thread if you think
>>> we need a new RC.
>>>
>>> - Cham
>>>
>>> On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud 
>>> wrote:
>>>
 I was just running the Beam SQL demo. I found one query fails with "the
 keyCoder of a GroupByKey must be deterministic" and another just hangs. I
 opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not
 sure if this calls for canceling the release or just a release note (SQL is
 still experimental). I'm continuing to track down the root cause, but am
 tied up with the Beam Meetup in SFO today.

 Andrew

 On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:

> +1,  Looking forward to the release!
>
> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Hi All,
>>
>> I ran Beam RC verification script [1] and updated the validation
>> spreadsheet [2]. I think the current release candidate looks good.
>>
>> So +1 for the release.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>> [2]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>
>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía 
>> wrote:
>>
>>> Looking at the dates on the Spark runner git log there was a PR
>>> merged to change Spark translation from classes to URNs. I cannot see 
>>> how
>>> this can impact performance. Looking at the other queries in the
>>> dashboards, there seems to be a great variability in the executions of 
>>> the
>>> Spark runner to the point of feeling we don't have guarantees anymore. I
>>> wonder if this was because of other loads shared in the server(s), or
>>> because our sample is too small for the standard deviation.
>>>
>>> I would proceed with the release, the real question is if we can
>>> somehow constraint the execution of this tests to have a more consistent
>>> output.
>>>
>>>
>>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot <
>>> echauc...@apache.org> wrote:
>>>
 Hi all,
 Regarding query7 in spark:
 - there doesn't seem to be a functional regression: query passes
 and output size is still the same

 - Also the performance degradation seems to be only on spark, the
 other runners do not seem to suffer from it.

 - performance degradation seems to be constant from 11/12 so we can
 eliminate temporary load on the jenkins server that would generate 
 delays
 in Max transform.

 => query7 uses Max transform, fanout and side inputs, has one of
 these parts recently (11/12/18) changed in spark?

 Etienne

 Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a
 écrit :

 Udi or anybody else who is familiar about Nexmark,  please -1 the
 vote thread if you think this particular performance regression for
 Spark/Direct runners is a blocker. Otherwise I think we can continue 
 the
 vote.

 Thanks,
 Cham

 On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

 Are either of these regressions due to known issues ? If not should
 they be considered release blockers ?

 Thanks,
 Cham

 On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:

 For DirectRunner there are regressions in query 7 sql direct
 runner batch mode
 
  (2x)
 and streaming mode (5x).


 On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:

 I see a regression for query 7 spark runner batch mode
 

Re: Issue with publishing maven artefacts locally

2018-12-13 Thread Alexey Romanenko
Scott and Garret, thank you for a fix, it works fine for me now.

Ismael, this is very good question. I think we still don’t have a determinate 
way of installing custom artifacts and using them for testing with custom 
pipelines. I’d very appreciate if someone could share their experience with 
that.


> On 13 Dec 2018, at 00:26, Chamikara Jayalath  wrote:
> 
> Not exactly sure if this is the reason but I noticed that Ismaël's command 
> above result in a beam-sdks-java-core-2.10.0-20181212.232426-1.jar instead of 
> a beam-sdks-java-core-2.10.0-SNAPSHOT.jar.
> 
> - Cham
> 
> On Wed, Dec 12, 2018 at 1:16 PM Ismaël Mejía  > wrote:
> Thanks Garrett for the quick fix. I just tested and it is working now.
> 
> I found a second issue (not related to Garrett's PR, it was the reason
> why I detected that local artifacts were not updated in our jenkins
> (in the other thread).
> 
> To validate that our daily snapshots don't break existing code we have
> a maven project that takes these snapshots from the apache repository.
> In maven speak:
> 
> 
> 
> snapshots
> Apache Development Snapshot Repository
> 
> https://repository.apache.org/content/repositories/snapshots/ 
> 
> 
> false
> 
> 
> true
> 
> 
> 
> 
> If we do 'mvn clean verify' in our project, it brings the SNAPSHOTS from 
> Apache.
> 
> Now if locally I fix something in Beam and deploy locally via:
> 
> ./gradlew -Ppublishing --no-parallel
> -PdistMgmtSnapshotsUrl=file:///home/ismael/.m2/repository -p
> sdks/java/core publish -x spotlessCheck -x test -x rat
> 
> It puts the generated more recent jars in the .m2 directory.
> However if you re execute the maven project, it detects and imports
> still the old jars.
> 
> I think that something is missing in the way we are generating the
> files for the .m2 directory via publishing.
> But I don't really understand clearly the way SNAPSHOT resolution works.
> Anyone has any idea or can contribute a fix for this one?
> 
> Thanks.
> 
> ps. if someone wants to check this out of the box you can reproduce
> the case by building this project (same case):
> https://github.com/jbonofre/beam-samples/ 
> 
> 
> On Wed, Dec 12, 2018 at 8:55 PM Garrett Jones  > wrote:
> >
> > Nevermind, I found a much easier fix (delete two characters): 
> > https://github.com/apache/beam/pull/7265 
> > 
> >
> >
> > On Wed, Dec 12, 2018 at 11:03 AM Garrett Jones  > > wrote:
> >>
> >> I'm inclined to undo a particular modification I made in my PR and 
> >> re-duplicate the repositories declaration between the Gradle plugin and 
> >> the new BOM module. Scott, what do you think?
> >>
> >>
> >> On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  >> > wrote:
> >>>
> >>> Thanks for pointing this out Alexy. This seems like we unintentionally 
> >>> broke something in PR#7197 [1]
> >>>
> >>> +Garrett Jones, who authored the change. Garrett can you help investigate?
> >>>
> >>> I went to check to see if we have any existing Jenkins jobs that would've 
> >>> caught this break. It seems the beam_Release_Gradle_NightlySnapshot job 
> >>> [2] has been failing for the last 10 days. Has anybody looked into this?
> >>>
> >>> [1] https://github.com/apache/beam/pull/7197 
> >>> 
> >>> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/ 
> >>> 
> >>>
> >>> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko 
> >>> mailto:aromanenko@gmail.com>> wrote:
> 
>  Hi all,
> 
>  I used to publish maven artefacts into local repository using this kind 
>  of command for example:
> 
>  ./gradlew -Ppublishing --no-parallel 
>  -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/ -p 
>  sdks/java/io/kafka/ publish
> 
>  It worked fine till today. Seems like (according to "git bisect”) this 
>  recent commit [1] introduced new functionality and now it fails with an 
>  error:
> 
>  * What went wrong:
>  A problem occurred configuring project ':beam-sdks-java-io-kafka'.
>  > Exception thrown while executing model rule: 
>  > PublishingPluginRules#publishing(ExtensionContainer)
> > Cannot set the value of read-only property 'repositories' for 
>  object of type 
>  org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.
> 
>  Does anyone know if this is a bug or I should use another command for 
>  the same purposes?
> 
> 
>  [1] 
>  https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>   

Re: Java performance tests dashboard

2018-12-13 Thread Łukasz Gajowy
śr., 12 gru 2018 o 19:24 Udi Meiri  napisał(a):

> Hi Lukasz,
> I was looking for statistics on I/O performance for writes of many files
> (~10k) on GCS.
>
> I found this dashboard
>  
> and
> I have some questions.
> 1. The tests that are "local filesystem" seem to be running on Dataflow
> and writing to GCS - is it okay to rename them to be officially GCS tests?
>

Ok - seems reasonable. Such name would be indeed more informative.


> 2. Is it okay if I add additional GCS tests to this dashboard?
>

Sure! :) What tests do you have in mind? Feel free to add me as a reviewer
for any PRs in this area.


Re: 2019 Beam Events

2018-12-13 Thread Etienne Chauchot
Great work ! Thanks for sharing Gris !
Etienne
Le mercredi 05 décembre 2018 à 07:47 +, Matthias Baetens a écrit :
> Great stuff, Gris! Looking forward to what 2019 will bring!
> The Beam meetup in London will have a new get together early next year as 
> well :-) 
> https://www.meetup.com/London-Apache-Beam-Meetup/ 
> 
> 
> On Tue, 4 Dec 2018 at 23:50 Austin Bennett  
> wrote:
> > Already got that process kicked off with the NY and LA meet ups, now that 
> > SF is about to be inagurated goal will be
> > to get these moving as well.  
> > For anyone that is in (or goes to) those areas:
> > https://www.meetup.com/New-York-Apache-Beam/
> > https://www.meetup.com/Los-Angeles-Apache-Beam/
> > 
> > Please reach out to get involved!  
> > 
> > 
> > 
> > 
> > On Tue, Dec 4, 2018 at 3:13 PM Griselda Cuevas  wrote:
> > > +1 to Pablo's suggestion, if there's interest in "Founding a Meetup group 
> > > in a particular city, let's create the
> > > Meetup page and start getting sign ups. Joana will be reaching out with a 
> > > comprenhexive list of how to get started
> > > and we're hoping to compile a high level calendar of 
> > > launches/announcements to feed into your meetup. 
> > > G 
> > > 
> > > On Tue, 4 Dec 2018 at 12:04, Daniel Salerno  wrote:
> > > > =)What good news!Okay, I'll set up the group and try to get 
> > > > interested.Thank you
> > > > 
> > > > Em ter, 4 de dez de 2018 às 17:19, Pablo Estrada  
> > > > escreveu:
> > > > > FWIW, for some of these places that have interest (e.g. Brazil, 
> > > > > Israel), it's possible to create a group in
> > > > > meetup.com, and start gauging interest, and looking for 
> > > > > organizers.Once a group of people with interest
> > > > > exists, it's easier to get interest / sponsorship to bring speakers.
> > > > > So if you are willing to create the group in meetup, Daniel, we can 
> > > > > monitor it and try to plan something as it
> > > > > grows : )
> > > > > Best
> > > > > -P.
> > > > > 
> > > > > On Tue, Dec 4, 2018 at 10:55 AM Daniel Salerno 
> > > > >  wrote:
> > > > > > It's a shame that there are no events in Brazil ...
> > > > > > 
> > > > > > =(
> > > > > > 
> > > > > > Em ter, 4 de dez de 2018 às 13:12, OrielResearch Eila Arich-Landkof 
> > > > > >  escreveu:
> > > > > > > agree 
> > > > > > > 
> > > > > > > On Tue, Dec 4, 2018 at 5:41 AM Chaim Turkel  
> > > > > > > wrote:
> > > > > > > > Israel would be nice to have one
> > > > > > > > 
> > > > > > > > chaim
> > > > > > > > 
> > > > > > > > On Tue, Dec 4, 2018 at 12:33 AM Griselda Cuevas 
> > > > > > > >  wrote:
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > > Hi Beam Community,
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > > I started curating industry conferences, meetups and events 
> > > > > > > > > that are relevant for Beam, this initial
> > > > > > > > list I came up with. I'd love your help adding others that I 
> > > > > > > > might have overlooked. Once we're satisfied
> > > > > > > > with the list, let's re-share so we can coordinate proposal 
> > > > > > > > submissions, attendance and community
> > > > > > > > meetups there.
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > > Cheers,
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > > G
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > >
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Loans are funded by
> > > > > > > > 
> > > > > > > > FinWise Bank, a Utah-chartered bank located in Sandy, 
> > > > > > > > 
> > > > > > > > Utah, member FDIC, Equal
> > > > > > > > 
> > > > > > > > Opportunity Lender. Merchant Cash Advances are 
> > > > > > > > 
> > > > > > > > made by Behalf. For more
> > > > > > > > 
> > > > > > > > information on ECOA, click here 
> > > > > > > > 
> > > > > > > > . For important information 
> > > > > > > > about 
> > > > > > > > 
> > > > > > > > opening a new
> > > > > > > > 
> > > > > > > > account, review Patriot Act procedures here 
> > > > > > > > 
> > > > > > > > .
> > > > > > > > 
> > > > > > > > Visit Legal 
> > > > > > > > 
> > > > > > > >  to
> > > > > > > > 
> > > > > > > > review our comprehensive program terms, 
> > > > > > > > 
> > > > > > > > conditions, and disclosures. 
> > > > > > > > 
> > > > > > > 
> > > > > > > 


Re: Beam snapshots broken

2018-12-13 Thread Mark Liu
Looks like the recent failure (like this job
) is
related to ':beam-sdks-python:test' change introduced in this PR
.
`./gradlew :beam-sdks-python:test` can reproduce the error.

Testing a fix in PR7 273
.

On Wed, Dec 12, 2018 at 8:31 AM Yifan Zou  wrote:

> Beam9 is offline right now. But, the job also failed on beam4 and 13 with 
> "Could
> not determine the dependencies of task ':beam-sdks-python:test.".
> Seems like the task dependency did not setup properly.
>
>
>
> On Wed, Dec 12, 2018 at 2:03 AM Ismaël Mejía  wrote:
>
>> You are right it seems that it was related to beam9 (wondering if it
>> was bad luck that it was always assigned to beam9 or we can improve
>> that poor balancing error).
>> However it failed again today against beam13 maybe this time is just a
>> build issue but seems related to python too.
>>
>> On Tue, Dec 11, 2018 at 7:33 PM Boyuan Zhang  wrote:
>> >
>> > Seems like, all failed jobs are not owing to the single task failure.
>> There failed task were executed on beam9, which was rebooted yesterday
>> because python tests failed continuously. +Yifan Zou may have more useful
>> content here.
>> >
>> > On Tue, Dec 11, 2018 at 9:10 AM Ismaël Mejía  wrote:
>> >>
>> >> It seems that Beam snapshots are broken since Dec. 2
>> >>
>> https://builds.apache.org/view/A-D/view/Beam/job/beam_Release_Gradle_NightlySnapshot/
>> >>
>> >> It seems "The :beam-website:startDockerContainer task failed."
>> >> Can somebody please take a look.
>>
>