2019 Beam Events

2018-12-03 Thread Griselda Cuevas
Hi Beam Community,

I started curating industry conferences, meetups and events that are
relevant for Beam, this initial list I came up with
.
*I'd love your help adding others that I might have overlooked.* Once we're
satisfied with the list, let's re-share so we can coordinate proposal
submissions, attendance and community meetups there.


Cheers,

G


beam9 failing most of the python tests

2018-12-03 Thread Ankur Goenka
Hi,

I see that beam9 is failing significantly more number of python related
builds [1].
This also results in more failure of beam_PreCommit_Portable_Python_Commit
[2] on beam9.
Can someone with access to beam9 take a look?

Thanks,
Ankur


[1] https://builds.apache.org/computer/beam9/builds
[2]
https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/buildTimeTrend


Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-03 Thread Thomas Weise
As noted, there is currently no support for Flink savepoints through the
Beam API.

However, it is now possible to restore from a savepoint with a Flink runner
specific pipeline option:

https://issues.apache.org/jira/browse/BEAM-5396
https://github.com/apache/beam/pull/7169#issuecomment-443283332

This was just merged - we are going to use it for the Python pipelines.

Thomas


On Mon, Dec 3, 2018 at 8:54 AM Lukasz Cwik  wrote:

> There are propoosals for pipeline drain[1] and also for snapshot and
> update[2] for Apache Beam. We would love contributions in this space.
>
> 1:
> https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
> 2:
> https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY
>
> On Mon, Dec 3, 2018 at 7:05 AM Wayne Collins  wrote:
>
>> Hi JC,
>>
>> Thanks for the quick response!
>> I had hoped for an in-pipeline solution for runner portability but it is
>> nice to know we're not the only ones stepping outside to interact with
>> runner management. :-)
>>
>> Wayne
>>
>>
>> On 2018-12-03 01:23, Juan Carlos Garcia wrote:
>>
>> Hi Wayne,
>>
>> We have the same setup and we do daily updates to our pipeline.
>>
>> The way we do it is using the flink tool via a Jenkins.
>>
>> Basically our deployment job do as follow:
>>
>> 1. Detect if the pipeline is running (it matches via job name)
>>
>> 2. If found, do a flink cancel with a savepoint (we uses hdfs for
>> checkpoint / savepoint) under a given directory.
>>
>> 3. It uses the flink run command for the new job and specify the
>> savepoint from step 2.
>>
>> I don't think there is any support to achieve the same from within the
>> pipeline. You need to do this externally as explained above.
>>
>> Best regards,
>> JC
>>
>>
>> Am Mo., 3. Dez. 2018, 00:46 hat Wayne Collins 
>> geschrieben:
>>
>>> Hi all,
>>> We have a number of Beam pipelines processing unbounded streams sourced
>>> from Kafka on the Flink runner and are very happy with both the platform
>>> and performance!
>>>
>>> The problem is with shutting down the pipelines...for version upgrades,
>>> system maintenance, load management, etc. it would be nice to be able to
>>> gracefully shut these down under software control but haven't been able to
>>> find a way to do so. We're in good shape on checkpointing and then cleanly
>>> recovering but shutdowns are all destructive to Flink or the Flink
>>> TaskManager.
>>>
>>> Methods tried:
>>>
>>> 1) Calling cancel on FlinkRunnerResult returned from pipeline.run()
>>> This would be our preferred method but p.run() doesn't return until
>>> termination and even if it did, the runner code simply throws:
>>> "throw new UnsupportedOperationException("FlinkRunnerResult does not
>>> support cancel.");"
>>> so this doesn't appear to be a near-term option.
>>>
>>> 2) Inject a "termination" message into the pipeline via Kafka
>>> This does get through, but calling exit() from a stage in the pipeline
>>> also terminates the Flink TaskManager.
>>>
>>> 3) Inject a "sleep" message, then manually restart the cluster
>>> This is our current method: we pause the data at the source, flood all
>>> branches of the pipeline with a "we're going down" msg so the stages can do
>>> a bit of housekeeping, then hard-stop the entire environment and re-launch
>>> with the new version.
>>>
>>> Is there a "Best Practice" method for gracefully terminating an
>>> unbounded pipeline from within the pipeline or from the mainline that
>>> launches it?
>>>
>>> Thanks!
>>> Wayne
>>>
>>> --
>>> Wayne Collinsdades.ca Inc.mailto:wayn...@dades.ca 
>>> cell:416-898-5137
>>>
>>>
>> --
>> Wayne Collinsdades.ca Inc.mailto:wayn...@dades.ca 
>> cell:416-898-5137
>>
>>


Re: Reviews for a few changes to the Go SDK

2018-12-03 Thread Ahmet Altay
Hi Andrew,

 +Robert Burke  (assignee for the both JIRAs) would be a
better reviewer but he is out of office this week. I was helping him with a
few reviews recently and I would be happy to review your changes too in his
absence.

Since you are not in a rush, I will try to review your changes before end
of the week.

Ahmet

On Sun, Dec 2, 2018 at 2:05 PM Andrew Brampton  wrote:

> Hi,
>
> I've been making a few changes to the experimental Go SDK. I'm in no rush,
> but per the contributors guide I'm sharing my intent, and looking for a
> reviewer.
>
> Specifically:
> [BEAM-6144] Add support for the autoscalingAlgorithm flag
> 
> [BEAM-6155] Migrate the Go SDK to the modern GCS library
> 
>
> thanks
> Andrew
>


Re: [DISCUSS] Structuring Java based DSLs

2018-12-03 Thread Kenneth Knowles
To be honest, I don't think there's much worth doing right now. I think
more self-contained is better for Beam SQL, generally. Two things I have on
my mind are (1) SQL as an inline transform in every SDK and (2) supporting
pure SQL like the CLI and JDBC driver, where the underlying language is an
implementation detail.

Big picture / long term, I would envision pure SQL, embedded SQL transform,
and a DataFrame-like API in ~each SDK all desugaring to relational algebra
nodes, sharing an optimizer, sharing some amount of mapping the physical
plan to Beam transforms. The necessarily SDK-specific parts are the
embedded transform API and UDFs in the host language. The rest should
remain an implementation detail that we can change.

 - For example, it is easy to imagine a customized columnar element/bundle
encoding and SDK harness that only works for SQL to remove overhead of
being general purpose. It could be written in C/C++/Go if we wanted to
squeeze it for perf. Such things are made harder by having an elaborate
end-user API between SQL and the core Beam model.
 - Conversely, for whatever is chosen to underlie SQL's execution,
stability is paramount. Ideally the simplest and least likely to change
transforms would be the foundation. And I wouldn't want to have to design a
user-friendly API for Euphoria or the join library just to enable a
different join algorithm in SQL.

So my take is keep SQL flexible, implement SQL on low-level and stable
APIs, use join library, Euphoria, etc, if it looks like a big win, but
don't build any policy here or do big refactors right now.

Kenn

On Mon, Dec 3, 2018 at 9:31 AM Jan Lukavský  wrote:

> Hi Robert,
>
> currently there is no actual proposal, I was just trying to gather
> feedback from the community. But my original thoughts would be [1]. I
> actually don't see much need for restructuring the code by nesting
> directories. If the community sees that it would make sense to structure
> the dependencies, the second step would probably be to figure out how to
> accomplish this. I don't have any exact solution in mind so far, it
> would be probably needed to first identify features that are needed by
> SQL and not supported by Euphoria currently. Then we can actually
> identify costs and see it this still makes sense.
>
>   Jan
>
> On 12/3/18 6:17 PM, Robert Bradshaw wrote:
> > Taking a step back, what exactly is the proposal. Looking at the
> > original message, I see
> >
> > (1) Letting SQL take a dependency on Euphoria, sharing more code and
> > taking advantage of the logical nesting of levels of abstraction. This
> > makes sense to me.
> > (2) Nesting the directories (but not the gradle targets or module
> > names?). Here I'm not so sure about the benefit, especially vs. the
> > cost.
> > On Sat, Dec 1, 2018 at 8:38 AM Jan Lukavský  wrote:
> >> I think that the fact that SQL uses some other internal dependency
> >> should remain hidden implementation detail. I absolutely agree that the
> >> dependency should of course remain sdks-java-sql in all cases.
> >>
> >> Jan
> >>
> >> On 12/1/18 12:54 AM, Robert Bradshaw wrote:
> >>> I suppose what I'm trying to say is that I see this module structure
> >>> as a tool for discoverability and enumerating end-user endpoints. In
> >>> other words, if one wants to use SQL, it would seem odd to have to
> >>> depend on sdks-java-euphoria-sql rather than just sdks-java-sql if
> >>> sdks-java-euphoria is also a DSL one might use. A sibling relationship
> >>> does not prohibit the layered approach to implementation that sounds
> >>> like it makes sense.
> >>>
> >>> (As for merging Euphoria into core, my initial impression is that's
> >>> probably a good idea, and something we should consider for 3.0 at the
> >>> very least.)
> >>>
> >>> On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský  wrote:
>  Hi Rui,
> 
>  yes, there are optimizations that could be added by each layer. The
> purpose of Euphoria layer actually is not to reorder or modify any user
> operators that are present in the pipeline (because it might not have
> enough information to do this), but it can for instance choose between
> various join implementations (shuffle join, broadcast join, ...) - so the
> optimizations it can do are more low level. But this plays nicely with the
> DSL hierarchy - each layer adds a little more restrictions, but can
> therefore do more optimizations. And I think that the layer between SDK and
> SQL wouldn't have to support SQL optimizations, it would only have to
> support way for SQL to express these optimizations.
> 
>  Jan -- Původní e-mail --
>  Od: Rui Wang 
>  Komu: dev@beam.apache.org
>  Datum: 30. 11. 2018 22:43:04
>  Předmět: Re: [DISCUSS] Structuring Java based DSLs
> 
>  SQL's optimization is another area to consider for integration. SQL
> optimization includes pushing down filters/projections, merging or removing
> or swapping plan nodes and comparing plan costs to 

Re: [DISCUSS] Structuring Java based DSLs

2018-12-03 Thread Jan Lukavský

Hi Robert,

currently there is no actual proposal, I was just trying to gather 
feedback from the community. But my original thoughts would be [1]. I 
actually don't see much need for restructuring the code by nesting 
directories. If the community sees that it would make sense to structure 
the dependencies, the second step would probably be to figure out how to 
accomplish this. I don't have any exact solution in mind so far, it 
would be probably needed to first identify features that are needed by 
SQL and not supported by Euphoria currently. Then we can actually 
identify costs and see it this still makes sense.


 Jan

On 12/3/18 6:17 PM, Robert Bradshaw wrote:

Taking a step back, what exactly is the proposal. Looking at the
original message, I see

(1) Letting SQL take a dependency on Euphoria, sharing more code and
taking advantage of the logical nesting of levels of abstraction. This
makes sense to me.
(2) Nesting the directories (but not the gradle targets or module
names?). Here I'm not so sure about the benefit, especially vs. the
cost.
On Sat, Dec 1, 2018 at 8:38 AM Jan Lukavský  wrote:

I think that the fact that SQL uses some other internal dependency
should remain hidden implementation detail. I absolutely agree that the
dependency should of course remain sdks-java-sql in all cases.

Jan

On 12/1/18 12:54 AM, Robert Bradshaw wrote:

I suppose what I'm trying to say is that I see this module structure
as a tool for discoverability and enumerating end-user endpoints. In
other words, if one wants to use SQL, it would seem odd to have to
depend on sdks-java-euphoria-sql rather than just sdks-java-sql if
sdks-java-euphoria is also a DSL one might use. A sibling relationship
does not prohibit the layered approach to implementation that sounds
like it makes sense.

(As for merging Euphoria into core, my initial impression is that's
probably a good idea, and something we should consider for 3.0 at the
very least.)

On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský  wrote:

Hi Rui,

yes, there are optimizations that could be added by each layer. The purpose of 
Euphoria layer actually is not to reorder or modify any user operators that are 
present in the pipeline (because it might not have enough information to do 
this), but it can for instance choose between various join implementations 
(shuffle join, broadcast join, ...) - so the optimizations it can do are more 
low level. But this plays nicely with the DSL hierarchy - each layer adds a 
little more restrictions, but can therefore do more optimizations. And I think 
that the layer between SDK and SQL wouldn't have to support SQL optimizations, 
it would only have to support way for SQL to express these optimizations.

Jan -- Původní e-mail --
Od: Rui Wang 
Komu: dev@beam.apache.org
Datum: 30. 11. 2018 22:43:04
Předmět: Re: [DISCUSS] Structuring Java based DSLs

SQL's optimization is another area to consider for integration. SQL 
optimization includes pushing down filters/projections, merging or removing or 
swapping plan nodes and comparing plan costs to choose best plan.  Add another 
layer between SQL and java core might need the layer to support SQL 
optimizations if there is a need.

I don't have a clear image on what SQL needs from Euphoria for 
optimization(best case is nothing). As those optimizations are happening or 
will happen, we might start to have a sense of it.

-Rui

On Fri, Nov 30, 2018 at 12:38 PM Robert Bradshaw  wrote:

I don't really see Euphoria as a subset of SQL or the other way
around, and I think it makes sense to use either without the other, so
by this criteria keeping them as siblings than a nesting.

That said, I think it's really good to have a bunch of shared code,
e.g. a join library that could be used by both. One could even depend
on the other without having to abandon the sibling relationship.
Something like retractions belong in the core SDK itself. Deeper than
that, actually, it should be part of the model.

- Robert

On Fri, Nov 30, 2018 at 7:20 PM David Morávek  wrote:

Jan, we made Kryo optional recently (it is a separate module and is used only in tests). 
From a quick look it seems that we forgot to remove compile time dependency from 
euphoria's build.gradle. Only "strong" dependencies I'm aware of are core SDK 
and guava. We'll be probably adding sketching extension dependency soon.

D.

On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský  wrote:

Hi Anton,
reactions inline.

-- Původní e-mail --
Od: Anton Kedin 
Komu: dev@beam.apache.org
Datum: 30. 11. 2018 18:17:06
Předmět: Re: [DISCUSS] Structuring Java based DSLs

I think this approach makes sense in general, Euphoria can be the 
implementation detail of SQL, similar to Join Library or core SDK Schemas.

I wonder though whether it would be better to bring Euphoria closer to core SDK 
first, maybe even merge them together. If you look at Reuven's recent work 
around schemas it seems like there are already 

Re: [DISCUSS] Structuring Java based DSLs

2018-12-03 Thread Robert Bradshaw
Taking a step back, what exactly is the proposal. Looking at the
original message, I see

(1) Letting SQL take a dependency on Euphoria, sharing more code and
taking advantage of the logical nesting of levels of abstraction. This
makes sense to me.
(2) Nesting the directories (but not the gradle targets or module
names?). Here I'm not so sure about the benefit, especially vs. the
cost.
On Sat, Dec 1, 2018 at 8:38 AM Jan Lukavský  wrote:
>
> I think that the fact that SQL uses some other internal dependency
> should remain hidden implementation detail. I absolutely agree that the
> dependency should of course remain sdks-java-sql in all cases.
>
>Jan
>
> On 12/1/18 12:54 AM, Robert Bradshaw wrote:
> > I suppose what I'm trying to say is that I see this module structure
> > as a tool for discoverability and enumerating end-user endpoints. In
> > other words, if one wants to use SQL, it would seem odd to have to
> > depend on sdks-java-euphoria-sql rather than just sdks-java-sql if
> > sdks-java-euphoria is also a DSL one might use. A sibling relationship
> > does not prohibit the layered approach to implementation that sounds
> > like it makes sense.
> >
> > (As for merging Euphoria into core, my initial impression is that's
> > probably a good idea, and something we should consider for 3.0 at the
> > very least.)
> >
> > On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský  wrote:
> >> Hi Rui,
> >>
> >> yes, there are optimizations that could be added by each layer. The 
> >> purpose of Euphoria layer actually is not to reorder or modify any user 
> >> operators that are present in the pipeline (because it might not have 
> >> enough information to do this), but it can for instance choose between 
> >> various join implementations (shuffle join, broadcast join, ...) - so the 
> >> optimizations it can do are more low level. But this plays nicely with the 
> >> DSL hierarchy - each layer adds a little more restrictions, but can 
> >> therefore do more optimizations. And I think that the layer between SDK 
> >> and SQL wouldn't have to support SQL optimizations, it would only have to 
> >> support way for SQL to express these optimizations.
> >>
> >>Jan -- Původní e-mail --
> >> Od: Rui Wang 
> >> Komu: dev@beam.apache.org
> >> Datum: 30. 11. 2018 22:43:04
> >> Předmět: Re: [DISCUSS] Structuring Java based DSLs
> >>
> >> SQL's optimization is another area to consider for integration. SQL 
> >> optimization includes pushing down filters/projections, merging or 
> >> removing or swapping plan nodes and comparing plan costs to choose best 
> >> plan.  Add another layer between SQL and java core might need the layer to 
> >> support SQL optimizations if there is a need.
> >>
> >> I don't have a clear image on what SQL needs from Euphoria for 
> >> optimization(best case is nothing). As those optimizations are happening 
> >> or will happen, we might start to have a sense of it.
> >>
> >> -Rui
> >>
> >> On Fri, Nov 30, 2018 at 12:38 PM Robert Bradshaw  
> >> wrote:
> >>
> >> I don't really see Euphoria as a subset of SQL or the other way
> >> around, and I think it makes sense to use either without the other, so
> >> by this criteria keeping them as siblings than a nesting.
> >>
> >> That said, I think it's really good to have a bunch of shared code,
> >> e.g. a join library that could be used by both. One could even depend
> >> on the other without having to abandon the sibling relationship.
> >> Something like retractions belong in the core SDK itself. Deeper than
> >> that, actually, it should be part of the model.
> >>
> >> - Robert
> >>
> >> On Fri, Nov 30, 2018 at 7:20 PM David Morávek  wrote:
> >>> Jan, we made Kryo optional recently (it is a separate module and is used 
> >>> only in tests). From a quick look it seems that we forgot to remove 
> >>> compile time dependency from euphoria's build.gradle. Only "strong" 
> >>> dependencies I'm aware of are core SDK and guava. We'll be probably 
> >>> adding sketching extension dependency soon.
> >>>
> >>> D.
> >>>
> >>> On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský  wrote:
>  Hi Anton,
>  reactions inline.
> 
>  -- Původní e-mail --
>  Od: Anton Kedin 
>  Komu: dev@beam.apache.org
>  Datum: 30. 11. 2018 18:17:06
>  Předmět: Re: [DISCUSS] Structuring Java based DSLs
> 
>  I think this approach makes sense in general, Euphoria can be the 
>  implementation detail of SQL, similar to Join Library or core SDK 
>  Schemas.
> 
>  I wonder though whether it would be better to bring Euphoria closer to 
>  core SDK first, maybe even merge them together. If you look at Reuven's 
>  recent work around schemas it seems like there are already similarities 
>  between that and Euphoria's approach, unless I'm missing the point (e.g. 
>  Filter transforms, FullJoin vs CoGroup... see [2]). And we're already 
>  switching parts of SQL to those transforms (e.g. SQL 

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-03 Thread Lukasz Cwik
There are propoosals for pipeline drain[1] and also for snapshot and
update[2] for Apache Beam. We would love contributions in this space.

1:
https://docs.google.com/document/d/1NExwHlj-2q2WUGhSO4jTu8XGhDPmm3cllSN8IMmWci8
2:
https://docs.google.com/document/d/1UWhnYPgui0gUYOsuGcCjLuoOUlGA4QaY91n8p3wz9MY

On Mon, Dec 3, 2018 at 7:05 AM Wayne Collins  wrote:

> Hi JC,
>
> Thanks for the quick response!
> I had hoped for an in-pipeline solution for runner portability but it is
> nice to know we're not the only ones stepping outside to interact with
> runner management. :-)
>
> Wayne
>
>
> On 2018-12-03 01:23, Juan Carlos Garcia wrote:
>
> Hi Wayne,
>
> We have the same setup and we do daily updates to our pipeline.
>
> The way we do it is using the flink tool via a Jenkins.
>
> Basically our deployment job do as follow:
>
> 1. Detect if the pipeline is running (it matches via job name)
>
> 2. If found, do a flink cancel with a savepoint (we uses hdfs for
> checkpoint / savepoint) under a given directory.
>
> 3. It uses the flink run command for the new job and specify the savepoint
> from step 2.
>
> I don't think there is any support to achieve the same from within the
> pipeline. You need to do this externally as explained above.
>
> Best regards,
> JC
>
>
> Am Mo., 3. Dez. 2018, 00:46 hat Wayne Collins 
> geschrieben:
>
>> Hi all,
>> We have a number of Beam pipelines processing unbounded streams sourced
>> from Kafka on the Flink runner and are very happy with both the platform
>> and performance!
>>
>> The problem is with shutting down the pipelines...for version upgrades,
>> system maintenance, load management, etc. it would be nice to be able to
>> gracefully shut these down under software control but haven't been able to
>> find a way to do so. We're in good shape on checkpointing and then cleanly
>> recovering but shutdowns are all destructive to Flink or the Flink
>> TaskManager.
>>
>> Methods tried:
>>
>> 1) Calling cancel on FlinkRunnerResult returned from pipeline.run()
>> This would be our preferred method but p.run() doesn't return until
>> termination and even if it did, the runner code simply throws:
>> "throw new UnsupportedOperationException("FlinkRunnerResult does not
>> support cancel.");"
>> so this doesn't appear to be a near-term option.
>>
>> 2) Inject a "termination" message into the pipeline via Kafka
>> This does get through, but calling exit() from a stage in the pipeline
>> also terminates the Flink TaskManager.
>>
>> 3) Inject a "sleep" message, then manually restart the cluster
>> This is our current method: we pause the data at the source, flood all
>> branches of the pipeline with a "we're going down" msg so the stages can do
>> a bit of housekeeping, then hard-stop the entire environment and re-launch
>> with the new version.
>>
>> Is there a "Best Practice" method for gracefully terminating an unbounded
>> pipeline from within the pipeline or from the mainline that launches it?
>>
>> Thanks!
>> Wayne
>>
>> --
>> Wayne Collinsdades.ca Inc.mailto:wayn...@dades.ca 
>> cell:416-898-5137
>>
>>
> --
> Wayne Collinsdades.ca Inc.mailto:wayn...@dades.ca 
> cell:416-898-5137
>
>


Re: [PROPOSAL] Prepare Beam 2.9.0 release

2018-12-03 Thread Chamikara Jayalath
I've been running tests on the existing branch. Will try to cut a candidate
today.

Seems like this is a small fix that fixes a flake. So feel free to send a
cherry-pick.

Thanks,
Cham

On Mon, Dec 3, 2018 at 4:28 AM Maximilian Michels  wrote:

> How far are we with the release? If the release branch hasn't been
> frozen, I'd like to cherry-pick
> https://github.com/apache/beam/pull/7171/files
>
> Thanks,
> Max
>
> On 30.11.18 04:17, Lukasz Cwik wrote:
> > I got to thank Steve Niemitz for double checking my work and pointing
> > out an error which helped narrow down the BEAM-6102 issue.
> >
> > On Thu, Nov 29, 2018 at 2:05 PM Chamikara Jayalath  > > wrote:
> >
> > Blockers were resolved and fixes were cherry-picked to the release
> > branch. I'll continue the release process.
> >
> > Thanks,
> > Cham
> >
> > On Mon, Nov 26, 2018 at 10:50 AM Lukasz Cwik  > > wrote:
> >
> > I'm working on BEAM-6102 and after 12 hours on the issue I have
> > not made much real progress. I initially suspected its a shading
> > issue with the Dataflow worker jar but can't reproduce the issue
> > without running a full Dataflow pipeline. Any help would
> > be appreciated, context of what I have tried is on the JIRA and
> > you can reach out to me on Slack.
> >
> > On Mon, Nov 26, 2018 at 9:50 AM Chamikara Jayalath
> > mailto:chamik...@google.com>> wrote:
> >
> > Hi All,
> >
> > Currently there are two blockers for the 2.9.0 release.
> >
> > * Dataflow cannot deserialize DoFns -
> > https://issues.apache.org/jira/browse/BEAM-6102
> > * [SQL] Nexmark 5, 7 time out -
> > https://issues.apache.org/jira/browse/BEAM-6082
> >
> > We'll postpone cutting the release candidate till these
> > issues are resolved.
> >
> > Thanks,
> > Cham
> >
> >
> > On Wed, Nov 21, 2018 at 1:22 PM Kenneth Knowles
> > mailto:k...@apache.org>> wrote:
> >
> > You could `git checkout -b release-2.9.0
> > `. But cherrypicking fixes is also easy.
> >
> > Kenn
> >
> > On Wed, Nov 21, 2018 at 1:06 PM Chamikara Jayalath
> > mailto:chamik...@google.com>>
> wrote:
> >
> > I went through Jenkins test suites and failures
> > seems to be known issues with JIRAs that are release
> > blockers. So we'll cherry-pick fixes to these.
> > In general though I think it might be hard to pick
> > an exact "green" time for cutting the release just
> > by eyeballing since different test suites run at
> > different times.
> >
> > - Cham
> >
> >
> >
> > On Wed, Nov 21, 2018 at 12:59 PM Valentyn Tymofieiev
> > mailto:valen...@google.com>>
> > wrote:
> >
> > It looks like 2.9.0 branch includes commits from
> > https://github.com/apache/beam/pull/7029, which
> > break Python Postcommit Test suite. Rollback is
> > in flight:
> > https://github.com/apache/beam/pull/7107, and
> > will need to be cherry-picked to release branch.
> >
> > I think we should try to adjust release branch
> > cutting process so that all relevant test suites
> > pass on the release branch when we cut it.
> >
> > On Wed, Nov 21, 2018 at 11:31 AM Chamikara
> > Jayalath  > > wrote:
> >
> > Release branch was cut:
> >
> https://github.com/apache/beam/tree/release-2.9.0
> > Please cherry-pick fixes to 2.9.0 blockers
> > to this branch.
> >
> > Thanks,
> > Cham
> >
> > On Tue, Nov 20, 2018 at 9:00 PM
> > Jean-Baptiste Onofré  > > wrote:
> >
> > Hi Cham,
> >
> > it sounds good to me.
> >
> > I'm resuming some works on IOs but
> > nothing blocker.
> >
> > Regards
> > JB
> >
> > On 21/11/2018 03:59, Chamikara Jayalath
> > wrote:
> >  > Hi All,
> >  >
> >  

Re: Nexmark Phrase Triggering

2018-12-03 Thread Łukasz Gajowy
(sorry, I missed this somehow)

The documentation for Nexmark was updated on Confluence:
https://cwiki.apache.org/confluence/x/sZCzBQ

For Performance tests to behave the same way, we need to add separate jobs
and simply use a different dataset for storing the results (same as in
Nexmark). Besides a JIRA ticket I created a while ago (
https://issues.apache.org/jira/browse/BEAM-6012) I don't think more
documentation is needed for this now.



pon., 26 lis 2018 o 21:05 Chamikara Jayalath 
napisał(a):

> Thanks Łukasz.
>
> Should the solution be documented (in Beam testing guide ?) so that other
> performance tests can support manual triggering without affecting benchmark
> results in a similar manner ?
>
> - Cham
>
> On Thu, Nov 22, 2018 at 4:03 AM Łukasz Gajowy 
> wrote:
>
>> Hi all,
>>
>> BEAM-6011 is now resolved. If any of you think your changes require such
>> additional build/performance checks, now you can run them by posting a
>> comment on Github (eg. "Run Direct Runner Nexmark Tests") to see if
>> everything is fine before merging a PR.
>>
>> Łukasz
>>
>> śr., 7 lis 2018 o 17:38 Andrew Pilloud  napisał(a):
>>
>>> My concern is that PR triggered runs should not publish results to the
>>> same table as runs on master. Looks like you already covered that in the
>>> bug.
>>>
>>> There is also the issue that there is only one jenkins executor that can
>>> run the local runner jobs. This will only be a problem if the manual runs
>>> become frequent.
>>>
>>> Andrew
>>>
>>> On Wed, Nov 7, 2018, 8:32 AM Łukasz Gajowy >>
 Hi,

 recent experience with Nexmark crashes made enabling Phrase Triggering
 in Nexmark suites even more urgent. If you have any opinions in this area
 feel free to share them.

 Here's the link to a corresponding JIRA issue:
 https://issues.apache.org/jira/browse/BEAM-6011

 Łukasz

>>>


Re: [PROPOSAL] Prepare Beam 2.9.0 release

2018-12-03 Thread Maximilian Michels
How far are we with the release? If the release branch hasn't been 
frozen, I'd like to cherry-pick 
https://github.com/apache/beam/pull/7171/files


Thanks,
Max

On 30.11.18 04:17, Lukasz Cwik wrote:
I got to thank Steve Niemitz for double checking my work and pointing 
out an error which helped narrow down the BEAM-6102 issue.


On Thu, Nov 29, 2018 at 2:05 PM Chamikara Jayalath > wrote:


Blockers were resolved and fixes were cherry-picked to the release
branch. I'll continue the release process.

Thanks,
Cham

On Mon, Nov 26, 2018 at 10:50 AM Lukasz Cwik mailto:lc...@google.com>> wrote:

I'm working on BEAM-6102 and after 12 hours on the issue I have
not made much real progress. I initially suspected its a shading
issue with the Dataflow worker jar but can't reproduce the issue
without running a full Dataflow pipeline. Any help would
be appreciated, context of what I have tried is on the JIRA and
you can reach out to me on Slack.

On Mon, Nov 26, 2018 at 9:50 AM Chamikara Jayalath
mailto:chamik...@google.com>> wrote:

Hi All,

Currently there are two blockers for the 2.9.0 release.

* Dataflow cannot deserialize DoFns -
https://issues.apache.org/jira/browse/BEAM-6102
* [SQL] Nexmark 5, 7 time out -
https://issues.apache.org/jira/browse/BEAM-6082

We'll postpone cutting the release candidate till these
issues are resolved.

Thanks,
Cham


On Wed, Nov 21, 2018 at 1:22 PM Kenneth Knowles
mailto:k...@apache.org>> wrote:

You could `git checkout -b release-2.9.0
`. But cherrypicking fixes is also easy.

Kenn

On Wed, Nov 21, 2018 at 1:06 PM Chamikara Jayalath
mailto:chamik...@google.com>> wrote:

I went through Jenkins test suites and failures
seems to be known issues with JIRAs that are release
blockers. So we'll cherry-pick fixes to these.
In general though I think it might be hard to pick
an exact "green" time for cutting the release just
by eyeballing since different test suites run at
different times.

- Cham



On Wed, Nov 21, 2018 at 12:59 PM Valentyn Tymofieiev
mailto:valen...@google.com>>
wrote:

It looks like 2.9.0 branch includes commits from
https://github.com/apache/beam/pull/7029, which
break Python Postcommit Test suite. Rollback is
in flight:
https://github.com/apache/beam/pull/7107, and
will need to be cherry-picked to release branch.

I think we should try to adjust release branch
cutting process so that all relevant test suites
pass on the release branch when we cut it.

On Wed, Nov 21, 2018 at 11:31 AM Chamikara
Jayalath mailto:chamik...@google.com>> wrote:

Release branch was cut:
https://github.com/apache/beam/tree/release-2.9.0
Please cherry-pick fixes to 2.9.0 blockers
to this branch.

Thanks,
Cham

On Tue, Nov 20, 2018 at 9:00 PM
Jean-Baptiste Onofré mailto:j...@nanthrax.net>> wrote:

Hi Cham,

it sounds good to me.

I'm resuming some works on IOs but
nothing blocker.

Regards
JB

On 21/11/2018 03:59, Chamikara Jayalath
wrote:
 > Hi All,
 >
 > Looks like there are three blockers
in the burndown list but they are
 > actively being worked on.
 >
 > If there's no objection I'll create
the release branch tomorrow morning.
 > We can cherry-pick fixes to the
blockers before building the first RC
 > hopefully on Monday.
 >
 > Thanks,
 > Cham
 

Beam Dependency Check Report (2018-12-03)

2018-12-03 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
future
0.16.0
0.17.1
2016-10-27
2018-10-31BEAM-5968
google-cloud-pubsub
0.35.4
0.39.0
2018-06-06
2018-11-27BEAM-5539
oauth2client
3.0.0
4.1.3
2016-07-28
2018-09-07BEAM-6089
pytz
2018.4
2018.7
2018-04-10
2018-10-29BEAM-5893
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.rabbitmq:amqp-client
4.6.0
5.5.1
2018-03-26
2018-11-29BEAM-5895
org.apache.rat:apache-rat-tasks
0.12
0.13
2016-06-07
2018-10-13BEAM-6039
com.google.auto.service:auto-service
1.0-rc2
1.0-rc4
2014-10-25
2017-12-11BEAM-5541
com.gradle:build-scan-plugin
1.13.1
2.0.2
2018-04-10
2018-11-12BEAM-5543
org.conscrypt:conscrypt-openjdk
1.1.3
1.4.1
2018-06-04
2018-11-01BEAM-5748
org.elasticsearch:elasticsearch
6.4.0
7.0.0-alpha1
2018-08-18
2018-11-13BEAM-6090
org.elasticsearch:elasticsearch-hadoop
5.0.0
7.0.0-alpha1
2016-10-26
2018-11-13BEAM-5551
org.elasticsearch.client:elasticsearch-rest-client
6.4.0
7.0.0-alpha1
2018-08-18
2018-11-13BEAM-6091
org.elasticsearch.test:framework
6.4.0
7.0.0-alpha1
2018-08-18
2018-11-13BEAM-6092
io.grpc:grpc-auth
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5896
io.grpc:grpc-context
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5897
io.grpc:grpc-core
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5898
io.grpc:grpc-netty
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5899
io.grpc:grpc-protobuf
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5900
io.grpc:grpc-stub
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5901
io.grpc:grpc-testing
1.13.1
1.16.1
2018-06-21
2018-10-26BEAM-5902
com.google.code.gson:gson
2.7
2.8.5
2016-06-14
2018-05-22BEAM-5558
org.apache.hbase:hbase-common
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5560
org.apache.hbase:hbase-hadoop-compat
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5561
org.apache.hbase:hbase-hadoop2-compat
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5562
org.apache.hbase:hbase-server
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5563
org.apache.hbase:hbase-shaded-client
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5564
org.apache.hive:hive-cli
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5566
org.apache.hive:hive-common
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5567
org.apache.hive:hive-exec
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5568
org.apache.hive.hcatalog:hive-hcatalog-core
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5569
net.java.dev.javacc:javacc
4.0
7.0.4
2006-03-17
2018-09-17BEAM-5570
javax.servlet:javax.servlet-api
3.1.0
4.0.1
2013-04-25
2018-04-20BEAM-5750
redis.clients:jedis
2.9.0
3.0.0-rc1
2016-07-22
2018-12-02BEAM-6125
org.eclipse.jetty:jetty-server
9.2.10.v20150310
9.4.14.v20181114
2015-03-10
2018-11-14BEAM-5752
org.eclipse.jetty:jetty-servlet
9.2.10.v20150310
9.4.14.v20181114
2015-03-10
2018-11-14BEAM-5753
net.java.dev.jna:jna
4.1.0
5.1.0
2014-03-06
2018-11-14BEAM-5573
junit:junit
4.12
4.13-beta-1
2014-12-04
2018-11-25BEAM-6127
com.esotericsoftware:kryo
4.0.2
5.0.0-RC1
2018-03-20
2018-06-19BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
org.apache.kudu:kudu-client
1.4.0
1.8.0
2017-06-05
2018-10-16BEAM-5575

Re: contributor in the Beam

2018-12-03 Thread Jean-Baptiste Onofré
Can you please fix the conflict in the PR ?

Thanks
Regards
JB

On 03/12/2018 08:52, Chaim Turkel wrote:
> it looks like there was a failure that is not due to the code, how can
> i continue the process?
> https://github.com/apache/beam/pull/7162
> 
> On Thu, Nov 29, 2018 at 9:15 PM Chaim Turkel  wrote:
>>
>> hi,
>>   i added another pr for the case of a self signed certificate ssl on
>> the mongodb server
>>
>> https://github.com/apache/beam/pull/7162
>> On Wed, Nov 28, 2018 at 5:16 PM Jean-Baptiste Onofré  
>> wrote:
>>>
>>> Hi,
>>>
>>> I already upgraded locally. Let me push the PR.
>>>
>>> Regards
>>> JB
>>>
>>> On 28/11/2018 16:02, Chaim Turkel wrote:
 is there any reason that the mongo client version is still on 3.2.2?
 can you upgrade it to 3.9.0?
 chaim
 On Tue, Nov 27, 2018 at 4:48 PM Jean-Baptiste Onofré  
 wrote:
>
> Hi Chaim,
>
> The best is to create a Jira describing the new features you want to
> add. Then, you can create a PR related to this Jira.
>
> As I'm the original MongoDbIO author, I would be more than happy to help
> you and review the PR.
>
> Thanks !
> Regards
> JB
>
> On 27/11/2018 15:37, Chaim Turkel wrote:
>> Hi,
>>   I have added a few features to the MongoDbIO and would like to add
>> them to the project.
>> I have read https://beam.apache.org/contribute/
>> I have added a jira user, what do i need to do next?
>>
>> chaim
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: contributor in the Beam

2018-12-03 Thread Reuven Lax
I just triggered a rerun of the test.

Hint: each check is followed by the trigger phrase that will rerun that
check.

On Sun, Dec 2, 2018 at 11:53 PM Chaim Turkel  wrote:

> it looks like there was a failure that is not due to the code, how can
> i continue the process?
> https://github.com/apache/beam/pull/7162
>
> On Thu, Nov 29, 2018 at 9:15 PM Chaim Turkel  wrote:
> >
> > hi,
> >   i added another pr for the case of a self signed certificate ssl on
> > the mongodb server
> >
> > https://github.com/apache/beam/pull/7162
> > On Wed, Nov 28, 2018 at 5:16 PM Jean-Baptiste Onofré 
> wrote:
> > >
> > > Hi,
> > >
> > > I already upgraded locally. Let me push the PR.
> > >
> > > Regards
> > > JB
> > >
> > > On 28/11/2018 16:02, Chaim Turkel wrote:
> > > > is there any reason that the mongo client version is still on 3.2.2?
> > > > can you upgrade it to 3.9.0?
> > > > chaim
> > > > On Tue, Nov 27, 2018 at 4:48 PM Jean-Baptiste Onofré <
> j...@nanthrax.net> wrote:
> > > >>
> > > >> Hi Chaim,
> > > >>
> > > >> The best is to create a Jira describing the new features you want to
> > > >> add. Then, you can create a PR related to this Jira.
> > > >>
> > > >> As I'm the original MongoDbIO author, I would be more than happy to
> help
> > > >> you and review the PR.
> > > >>
> > > >> Thanks !
> > > >> Regards
> > > >> JB
> > > >>
> > > >> On 27/11/2018 15:37, Chaim Turkel wrote:
> > > >>> Hi,
> > > >>>   I have added a few features to the MongoDbIO and would like to
> add
> > > >>> them to the project.
> > > >>> I have read https://beam.apache.org/contribute/
> > > >>> I have added a jira user, what do i need to do next?
> > > >>>
> > > >>> chaim
> > > >>>
> > > >>
> > > >> --
> > > >> Jean-Baptiste Onofré
> > > >> jbono...@apache.org
> > > >> http://blog.nanthrax.net
> > > >> Talend - http://www.talend.com
> > > >
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
>
> --
>
>
> Loans are funded by
> FinWise Bank, a Utah-chartered bank located in Sandy,
> Utah, member FDIC, Equal
> Opportunity Lender. Merchant Cash Advances are
> made by Behalf. For more
> information on ECOA, click here
> . For important information about
> opening a new
> account, review Patriot Act procedures here
> .
> Visit Legal
>  to
> review our comprehensive program terms,
> conditions, and disclosures.
>