from:"Andrew Pilloud"

Re: Beam SQL Alias issue while using With Clause

2023-03-02 Thread Andrew Pilloud via dev

Hi Talat,

I managed to turn your test case into something against Calcite. It
looks like there is a bug affecting tables that contain one or more
single element structs and no multi element structs. I've sent the
details to the Calcite mailing list here.
https://lists.apache.org/thread/tlr9hsmx09by79h91nwp2d4nv8jfwsto

I'm experimenting with ideas on how to work around this but a fix will
likely require a Calcite upgrade, which is not something I'd have time
to help with. (I'm not on the Google Beam team anymore.)

Andrew

On Wed, Feb 22, 2023 at 12:18 PM Talat Uyarer
 wrote:
>
> Hi @Andrew Pilloud
>
> Sorry for the late response. Yes your test is working fine. I changed the 
> test input structure like our input structure. Now this test also has the 
> same exception.
>
> Feb 21, 2023 2:02:28 PM 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> INFO: SQL:
> WITH `tempTable` AS (SELECT `panwRowTestTable`.`user_info`, 
> `panwRowTestTable`.`id`, `panwRowTestTable`.`value`
> FROM `beam`.`panwRowTestTable` AS `panwRowTestTable`
> WHERE `panwRowTestTable`.`user_info`.`name` = 'innerStr') (SELECT 
> `tempTable`.`user_info`, `tempTable`.`id`, `tempTable`.`value`
> FROM `tempTable` AS `tempTable`)
> Feb 21, 2023 2:02:28 PM 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(user_info=[ROW($0)], id=[$1], value=[$2])
>   LogicalFilter(condition=[=($0.name, 'innerStr')])
> LogicalProject(name=[$0.name], id=[$1], value=[$2])
>   BeamIOSourceRel(table=[[beam, panwRowTestTable]])
>
>
> fieldList must not be null, type = VARCHAR
> java.lang.AssertionError: fieldList must not be null, type = VARCHAR
>
> I dont know what is different from yours. I am sharing my version of the test 
> also.
>
>
> Index: 
> sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===
> diff --git 
> a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
>  
> b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
> --- 
> a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
>  (revision fd383fae1adc545b6b6a22b274902cda956fec49)
> +++ 
> b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamComplexTypeTest.java
>  (date 1677017032324)
> @@ -54,6 +54,9 @@
>private static final Schema innerRowSchema =
>
> Schema.builder().addStringField("string_field").addInt64Field("long_field").build();
>
> +  private static final Schema innerPanwRowSchema =
> +  Schema.builder().addStringField("name").build();
> +
>private static final Schema innerRowWithArraySchema =
>Schema.builder()
>.addStringField("string_field")
> @@ -127,8 +130,12 @@
>.build()))
>.put(
>"basicRowTestTable",
> -  TestBoundedTable.of(FieldType.row(innerRowSchema), "col")
> -  
> .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build()))
> +  TestBoundedTable.of(FieldType.row(innerRowSchema), "col", 
> FieldType.INT64, "field")
> +  
> .addRows(Row.withSchema(innerRowSchema).addValues("innerStr", 1L).build(), 
> 1L))
> +.put(
> +  "panwRowTestTable",
> +  TestBoundedTable.of(FieldType.row(innerPanwRowSchema), 
> "user_info", FieldType.INT64, "id", FieldType.STRING, "value")
> +  
> .addRows(Row.withSchema(innerRowSchema).addValues("name", 1L).build(), 1L, 
> "some_value"))
>.put(
>"rowWithArrayTestTable",
>TestBoundedTable.of(FieldType.row(rowWithArraySchema), 
> "col")
> @@ -219,6 +226,21 @@
>  .build());
>  pipeline.run().waitUntilFinish(Duration.standardMinutes(2));
>}
> +
> +  @Test
> +  public void testBasicRowWhereField() {
> +BeamSqlEnv sqlEnv = BeamSqlEnv.inMemory(readOnlyTableProvider);
> +PCollection stream =
> +BeamSqlRelUtils.toPCollection(
> +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM 
> panwRowTestTable WHERE panwRowTestTable.`user_info`.`name` = 'in

Re: Review ask for Flink Runner Backlog Metric Bug Fix

2023-02-23 Thread Andrew Pilloud via dev

The bot says there are no reviewers for Flink. Possibly you'll find a
volunteer to review it here?

On Thu, Feb 23, 2023 at 4:47 PM Talat Uyarer via dev 
wrote:

> Hi,
>
> I created a bugfix for Flink Runner backlog metrics. I asked OWNERs and
> try to run assign reviewer command. But I am not sure. I pressed the right
> button :)
>
> If you know some who can review this change
> https://github.com/apache/beam/pull/25554
>
> Could you assign him/her to this mr ?
>
> Thanks
>

Re: Beam SQL Alias issue while using With Clause

2023-02-10 Thread Andrew Pilloud via dev

voke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at com.sun.proxy.$Proxy5.processTestClass(Unknown Source)
> at
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
> at
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
> at
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
> at
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
> at
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
> at java.base/java.lang.Thread.run(Thread.java:829)
>
>
>
>
>
> On Thu, Feb 2, 2023 at 1:06 PM Andrew Pilloud  wrote:
>
>> It looks like Calcite stopped considering field names in RelNode equality
>> as of Calcite 2.22 (which we use in Beam v2.34.0+). This can result in a
>> planner state where two nodes that only differ by field name are considered
>> equivalent.
>>
>> I have a fix for Beam in https://github.com/apache/beam/pull/25290
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_pull_25290=DwMFaQ=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g=Zq7lZSEVnOqbVQquizMtIO5a2yUoZRWyaM63PCyD6M7a6bSqJeUJctpvrtSaBoMm=zbs7j8bbjaW39q8n3AF_bLzmIaCdLmJutqxEVfTHOZE=>
>> and I'll send an email to the Calcite dev list with more details.
>>
>> Andrew
>>
>> On Fri, Jan 27, 2023 at 11:33 AM Andrew Pilloud 
>> wrote:
>>
>>> Also this is at very least a Beam bug. You can file a Beam issue if you
>>> want, otherwise I will when I get back.
>>>
>>> Andrew
>>>
>>> On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud 
>>> wrote:
>>>
>>>> Hi Talat,
>>>>
>>>> I did get your test case running and added some logging to
>>>> RexProgramBuilder.mergePrograms. There is only one merge that occurs during
>>>> the test and it has an output type of RecordType(JavaType(int) ID,
>>>> JavaType(class java.lang.String) V). This does seem like the correct output
>>>> name but it doesn't match the final output name, so something is still
>>>> different than the Beam test case. I also modified mergePrograms to
>>>> purposely corrupt the output names, that did not cause the test to fail or
>>>> trip the 'assert mergedProg.getOutputRowType() ==
>>>> topProgram.getOutputRowType();' in mergePrograms. I could not find any
>>>> Calcite unit tests for RexProgramBuilder.mergePrograms or
>>>> CoreRules.CALC_MERGE rule so I think it is still probable that the problem
>>>> is in this area.
>>>>
>>>> One minor issue I encountered. It took me a while to get your test case
>>>> running, it doesn't appear there are any calcite gradle rules to run
>>>> CoreQuidemTest and constructing the classpath manually was tedious. Did I
>>>> miss something?
>>>>
>>>> I'm still working on this but I'm out today and Monday, it will
>>>> probably be Wednesday before I make any more progress.
>

Re: OpenJDK8 / OpenJDK11 container deprecation

2023-02-07 Thread Andrew Pilloud via dev

This sounds reasonable to me as well.

I've made swaps like this in the past, the base image of each is probably a
bigger factor than the JDK. The openjdk images were based on Debian 11. The
default eclipse-temurin images are based on Ubuntu 22.04 with an alpine
option. Ubuntu is a Debian derivative but the versions and package names
aren't exact matches and Ubuntu tends to update a little faster. For most
users I don't think this will matter but users building custom containers
may need to make minor changes. The alpine option will be much smaller
(which could be a significant improvement) but would be a more significant
change to the environment.

On Tue, Feb 7, 2023 at 5:18 PM Robert Bradshaw via dev 
wrote:

> Seams reasonable to me.
>
> On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user 
> wrote:
> >
> > As per [1], the JDK8 and JDK11 containers that Apache Beam uses have
> stopped being built and supported since July 2022. I have filed [2] to
> track the resolution of this issue.
> >
> > Based upon [1], almost everyone is swapping to the eclipse-temurin
> container[3] as their base based upon the linked issues from the
> deprecation notice[1]. The eclipse-temurin container is released under
> these licenses:
> > Apache License, Version 2.0
> > Eclipse Distribution License 1.0 (BSD)
> > Eclipse Public License 2.0
> > 一 (Secondary) GNU General Public License, version 2 with OpenJDK
> Assembly Exception
> > 一 (Secondary) GNU General Public License, version 2 with the GNU
> Classpath Exception
> >
> > I propose that we swap all our containers to the eclipse-temurin
> containers[3].
> >
> > Open to other ideas and also would be great to hear about your
> experience in any other projects that you have had to make a similar
> decision.
> >
> > 1: https://github.com/docker-library/openjdk/issues/505
> > 2: https://github.com/apache/beam/issues/25371
> > 3: https://hub.docker.com/_/eclipse-temurin
>

Re: Beam SQL Alias issue while using With Clause

2023-02-02 Thread Andrew Pilloud via dev

It looks like Calcite stopped considering field names in RelNode equality
as of Calcite 2.22 (which we use in Beam v2.34.0+). This can result in a
planner state where two nodes that only differ by field name are considered
equivalent.

I have a fix for Beam in https://github.com/apache/beam/pull/25290 and I'll
send an email to the Calcite dev list with more details.

Andrew

On Fri, Jan 27, 2023 at 11:33 AM Andrew Pilloud  wrote:

> Also this is at very least a Beam bug. You can file a Beam issue if you
> want, otherwise I will when I get back.
>
> Andrew
>
> On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud 
> wrote:
>
>> Hi Talat,
>>
>> I did get your test case running and added some logging to
>> RexProgramBuilder.mergePrograms. There is only one merge that occurs during
>> the test and it has an output type of RecordType(JavaType(int) ID,
>> JavaType(class java.lang.String) V). This does seem like the correct output
>> name but it doesn't match the final output name, so something is still
>> different than the Beam test case. I also modified mergePrograms to
>> purposely corrupt the output names, that did not cause the test to fail or
>> trip the 'assert mergedProg.getOutputRowType() ==
>> topProgram.getOutputRowType();' in mergePrograms. I could not find any
>> Calcite unit tests for RexProgramBuilder.mergePrograms or
>> CoreRules.CALC_MERGE rule so I think it is still probable that the problem
>> is in this area.
>>
>> One minor issue I encountered. It took me a while to get your test case
>> running, it doesn't appear there are any calcite gradle rules to run
>> CoreQuidemTest and constructing the classpath manually was tedious. Did I
>> miss something?
>>
>> I'm still working on this but I'm out today and Monday, it will probably
>> be Wednesday before I make any more progress.
>>
>> Andrew
>>
>> On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> Yes This aligned also with my debugging. In My Kenn's reply you can see
>>> a sql test which I wrote in Calcite. Somehow Calcite does not have this
>>> issue with the 1.28 version.
>>>
>>> !use post
>>> !set outputformat mysql
>>>
>>> #Test aliases with with clause
>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>>> "hr"."emps"."name" as v from "hr"."emps")
>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>>> tempTable.v <> '11' ;
>>> +-+---+
>>> | ID  | value |
>>> +-+---+
>>> | 100 | Bill  |
>>> | 110 | Theodore  |
>>> | 150 | Sebastian |
>>> | 200 | Eric  |
>>> +-+---+
>>> (4 rows)
>>>
>>> !ok
>>>
>>>
>>> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud 
>>> wrote:
>>>
>>>> Yes, that worked.
>>>>
>>>> The issue does not occur if I disable all of the following planner
>>>> rules: CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE,
>>>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE),
>>>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE).
>>>>
>>>> All the rules share a common call to RexProgramBuilder.mergePrograms,
>>>> so I suspect the problem lies there. I spent some time looking but wasn't
>>>> able to find it by code inspection, it looks like this code path is doing
>>>> the right thing with names. I'll spend some time tomorrow trying to
>>>> reproduce this on pure Calcite.
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer <
>>>> tuya...@paloaltonetworks.com> wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> Thanks for writing a test for this use case. Without Where clause it
>>>>> works as expected on our test cases also too. Please add where clause on
>>>>> second select. With the below query it does not return column names. I
>>>>> tested on my local also.
>>>>>
>>>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
>>>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
>>>>> id > 1
>>>>>
>>>>> Thanks
>>>>>
>>>

Re: Beam SQL Alias issue while using With Clause

2023-01-27 Thread Andrew Pilloud via dev

Also this is at very least a Beam bug. You can file a Beam issue if you
want, otherwise I will when I get back.

Andrew

On Fri, Jan 27, 2023 at 11:27 AM Andrew Pilloud  wrote:

> Hi Talat,
>
> I did get your test case running and added some logging to
> RexProgramBuilder.mergePrograms. There is only one merge that occurs during
> the test and it has an output type of RecordType(JavaType(int) ID,
> JavaType(class java.lang.String) V). This does seem like the correct output
> name but it doesn't match the final output name, so something is still
> different than the Beam test case. I also modified mergePrograms to
> purposely corrupt the output names, that did not cause the test to fail or
> trip the 'assert mergedProg.getOutputRowType() ==
> topProgram.getOutputRowType();' in mergePrograms. I could not find any
> Calcite unit tests for RexProgramBuilder.mergePrograms or
> CoreRules.CALC_MERGE rule so I think it is still probable that the problem
> is in this area.
>
> One minor issue I encountered. It took me a while to get your test case
> running, it doesn't appear there are any calcite gradle rules to run
> CoreQuidemTest and constructing the classpath manually was tedious. Did I
> miss something?
>
> I'm still working on this but I'm out today and Monday, it will probably
> be Wednesday before I make any more progress.
>
> Andrew
>
> On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> Hi Andrew,
>>
>> Yes This aligned also with my debugging. In My Kenn's reply you can see a
>> sql test which I wrote in Calcite. Somehow Calcite does not have this issue
>> with the 1.28 version.
>>
>> !use post
>> !set outputformat mysql
>>
>> #Test aliases with with clause
>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>> "hr"."emps"."name" as v from "hr"."emps")
>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>> tempTable.v <> '11' ;
>> +-+---+
>> | ID  | value |
>> +-+---+
>> | 100 | Bill  |
>> | 110 | Theodore  |
>> | 150 | Sebastian |
>> | 200 | Eric  |
>> +-+---+
>> (4 rows)
>>
>> !ok
>>
>>
>> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud 
>> wrote:
>>
>>> Yes, that worked.
>>>
>>> The issue does not occur if I disable all of the following planner
>>> rules: CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE,
>>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE),
>>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE).
>>>
>>> All the rules share a common call to RexProgramBuilder.mergePrograms, so
>>> I suspect the problem lies there. I spent some time looking but wasn't able
>>> to find it by code inspection, it looks like this code path is doing the
>>> right thing with names. I'll spend some time tomorrow trying to reproduce
>>> this on pure Calcite.
>>>
>>> Andrew
>>>
>>>
>>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer <
>>> tuya...@paloaltonetworks.com> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> Thanks for writing a test for this use case. Without Where clause it
>>>> works as expected on our test cases also too. Please add where clause on
>>>> second select. With the below query it does not return column names. I
>>>> tested on my local also.
>>>>
>>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
>>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
>>>> id > 1
>>>>
>>>> Thanks
>>>>
>>>> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud 
>>>> wrote:
>>>>
>>>>> +dev@beam.apache.org 
>>>>>
>>>>> I tried reproducing this but was not successful, the output schema was
>>>>> as expected. I added the following to BeamSqlMultipleSchemasTest.java at
>>>>> head. (I did discover that  
>>>>> PAssert.that(result).containsInAnyOrder(output)
>>>>> doesn't validate column names however.)
>>>>>
>>>>>   @Test
>>>>>   public void testSelectAs() {
>>>>> PCollection input = pipeline.apply(create(row(1, "strstr")));
>>>>>
>>>>> PCollection result =
>>>>> input.apply

Re: Beam SQL Alias issue while using With Clause

2023-01-27 Thread Andrew Pilloud via dev

Hi Talat,

I did get your test case running and added some logging to
RexProgramBuilder.mergePrograms. There is only one merge that occurs during
the test and it has an output type of RecordType(JavaType(int) ID,
JavaType(class java.lang.String) V). This does seem like the correct output
name but it doesn't match the final output name, so something is still
different than the Beam test case. I also modified mergePrograms to
purposely corrupt the output names, that did not cause the test to fail or
trip the 'assert mergedProg.getOutputRowType() ==
topProgram.getOutputRowType();' in mergePrograms. I could not find any
Calcite unit tests for RexProgramBuilder.mergePrograms or
CoreRules.CALC_MERGE rule so I think it is still probable that the problem
is in this area.

One minor issue I encountered. It took me a while to get your test case
running, it doesn't appear there are any calcite gradle rules to run
CoreQuidemTest and constructing the classpath manually was tedious. Did I
miss something?

I'm still working on this but I'm out today and Monday, it will probably be
Wednesday before I make any more progress.

Andrew

On Fri, Jan 27, 2023 at 10:40 AM Talat Uyarer 
wrote:

> Hi Andrew,
>
> Yes This aligned also with my debugging. In My Kenn's reply you can see a
> sql test which I wrote in Calcite. Somehow Calcite does not have this issue
> with the 1.28 version.
>
> !use post
> !set outputformat mysql
>
> #Test aliases with with clause
> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
> "hr"."emps"."name" as v from "hr"."emps")
> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
> tempTable.v <> '11' ;
> +-+---+
> | ID  | value |
> +-+---+
> | 100 | Bill      |
> | 110 | Theodore  |
> | 150 | Sebastian |
> | 200 | Eric  |
> +-+---+
> (4 rows)
>
> !ok
>
>
> On Wed, Jan 25, 2023 at 6:08 PM Andrew Pilloud 
> wrote:
>
>> Yes, that worked.
>>
>> The issue does not occur if I disable all of the following planner rules:
>> CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE,
>> LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE),
>> and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE).
>>
>> All the rules share a common call to RexProgramBuilder.mergePrograms, so
>> I suspect the problem lies there. I spent some time looking but wasn't able
>> to find it by code inspection, it looks like this code path is doing the
>> right thing with names. I'll spend some time tomorrow trying to reproduce
>> this on pure Calcite.
>>
>> Andrew
>>
>>
>> On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> Thanks for writing a test for this use case. Without Where clause it
>>> works as expected on our test cases also too. Please add where clause on
>>> second select. With the below query it does not return column names. I
>>> tested on my local also.
>>>
>>> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
>>> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
>>> id > 1
>>>
>>> Thanks
>>>
>>> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud 
>>> wrote:
>>>
>>>> +dev@beam.apache.org 
>>>>
>>>> I tried reproducing this but was not successful, the output schema was
>>>> as expected. I added the following to BeamSqlMultipleSchemasTest.java at
>>>> head. (I did discover that  PAssert.that(result).containsInAnyOrder(output)
>>>> doesn't validate column names however.)
>>>>
>>>>   @Test
>>>>   public void testSelectAs() {
>>>> PCollection input = pipeline.apply(create(row(1, "strstr")));
>>>>
>>>> PCollection result =
>>>> input.apply(SqlTransform.query("WITH tempTable (id, v) AS
>>>> (SELECT f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS fout_int,
>>>> v AS fout_string FROM tempTable"));
>>>>
>>>> Schema output_schema =
>>>>
>>>> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
>>>> assertThat(result.getSchema(), equalTo(output_schema));
>>>>
>>>> Row output = Row.withSchema(output_schema).addValues(1,
>>>> "strstr").build();
>>>> PAssert.that(result).containsInAnyOrder(output

Re: Beam SQL Alias issue while using With Clause

2023-01-25 Thread Andrew Pilloud via dev

Yes, that worked.

The issue does not occur if I disable all of the following planner rules:
CoreRules.FILTER_CALC_MERGE, CoreRules.PROJECT_CALC_MERGE,
LogicalCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE),
and BeamCalcMergeRule.INSTANCE (which wraps CoreRules.CALC_MERGE).

All the rules share a common call to RexProgramBuilder.mergePrograms, so I
suspect the problem lies there. I spent some time looking but wasn't able
to find it by code inspection, it looks like this code path is doing the
right thing with names. I'll spend some time tomorrow trying to reproduce
this on pure Calcite.

Andrew


On Tue, Jan 24, 2023 at 8:24 PM Talat Uyarer 
wrote:

> Hi Andrew,
>
> Thanks for writing a test for this use case. Without Where clause it works
> as expected on our test cases also too. Please add where clause on second
> select. With the below query it does not return column names. I tested on
> my local also.
>
> WITH tempTable (id, v) AS (SELECT f_int as id, f_string as v FROM
> PCOLLECTION) SELECT id AS fout_int, v AS fout_string FROM tempTable WHERE
> id > 1
>
> Thanks
>
> On Tue, Jan 24, 2023 at 5:28 PM Andrew Pilloud 
> wrote:
>
>> +dev@beam.apache.org 
>>
>> I tried reproducing this but was not successful, the output schema was as
>> expected. I added the following to BeamSqlMultipleSchemasTest.java at head.
>> (I did discover that  PAssert.that(result).containsInAnyOrder(output)
>> doesn't validate column names however.)
>>
>>   @Test
>>   public void testSelectAs() {
>> PCollection input = pipeline.apply(create(row(1, "strstr")));
>>
>> PCollection result =
>> input.apply(SqlTransform.query("WITH tempTable (id, v) AS (SELECT
>> f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS fout_int, v AS
>> fout_string FROM tempTable"));
>>
>> Schema output_schema =
>>
>> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
>> assertThat(result.getSchema(), equalTo(output_schema));
>>
>> Row output = Row.withSchema(output_schema).addValues(1,
>> "strstr").build();
>> PAssert.that(result).containsInAnyOrder(output);
>> pipeline.run();
>>   }
>>
>> On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer <
>> tuya...@paloaltonetworks.com> wrote:
>>
>>> Hi Kenn,
>>>
>>> Thank you for replying back to my email.
>>>
>>> I was under the same impression about Calcite. But I wrote a test on
>>> Calcite 1.28 too. It is working without issue that I see on BEAM
>>>
>>> Here is my test case. If you want you can also run on Calcite. Please
>>> put under core/src/test/resources/sql as text file. and Run CoreQuidemTest
>>> class.
>>>
>>> !use post
>>> !set outputformat mysql
>>>
>>> #Test aliases with with clause
>>> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
>>> "hr"."emps"."name" as v from "hr"."emps")
>>> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
>>> tempTable.v <> '11' ;
>>> +-+---+
>>> | ID  | value |
>>> +-+---+
>>> | 100 | Bill  |
>>> | 110 | Theodore  |
>>> | 150 | Sebastian |
>>> | 200 | Eric  |
>>> +-+---+
>>> (4 rows)
>>>
>>> !ok
>>>
>>>
>>> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles 
>>> wrote:
>>>
>>>> Looking at the code that turns a logical CalcRel into a BeamCalcRel I
>>>> do not see any obvious cause for this:
>>>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69=DwMFaQ=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM=>
>>>>
>>>> I don't like to guess that upstream libraries have the bug, but in this
>>>> case I wonder if the alias is lost in the Calcite optimizer rule for
>>>> merging the projects and filters into a Calc.
>>>>
>>&

Re: Beam SQL Alias issue while using With Clause

2023-01-24 Thread Andrew Pilloud via dev

+dev@beam.apache.org 

I tried reproducing this but was not successful, the output schema was as
expected. I added the following to BeamSqlMultipleSchemasTest.java at head.
(I did discover that  PAssert.that(result).containsInAnyOrder(output)
doesn't validate column names however.)

  @Test
  public void testSelectAs() {
PCollection input = pipeline.apply(create(row(1, "strstr")));

PCollection result =
input.apply(SqlTransform.query("WITH tempTable (id, v) AS (SELECT
f_int as id, f_string as v FROM PCOLLECTION) SELECT id AS fout_int, v AS
fout_string FROM tempTable"));

Schema output_schema =

Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
assertThat(result.getSchema(), equalTo(output_schema));

Row output = Row.withSchema(output_schema).addValues(1,
"strstr").build();
PAssert.that(result).containsInAnyOrder(output);
pipeline.run();
  }

On Tue, Jan 24, 2023 at 8:13 AM Talat Uyarer 
wrote:

> Hi Kenn,
>
> Thank you for replying back to my email.
>
> I was under the same impression about Calcite. But I wrote a test on
> Calcite 1.28 too. It is working without issue that I see on BEAM
>
> Here is my test case. If you want you can also run on Calcite. Please put
> under core/src/test/resources/sql as text file. and Run CoreQuidemTest
> class.
>
> !use post
> !set outputformat mysql
>
> #Test aliases with with clause
> WITH tempTable(id, v) AS (select "hr"."emps"."empid" as id, 
> "hr"."emps"."name" as v from "hr"."emps")
> SELECT tempTable.id as id, tempTable.v as "value" FROM tempTable WHERE 
> tempTable.v <> '11' ;
> +-+---+
> | ID  | value |
> +-+---+
> | 100 | Bill  |
> | 110 | Theodore  |
> | 150 | Sebastian |
> | 200 | Eric  |
> +-+---+
> (4 rows)
>
> !ok
>
>
> On Mon, Jan 23, 2023 at 10:16 AM Kenneth Knowles  wrote:
>
>> Looking at the code that turns a logical CalcRel into a BeamCalcRel I do
>> not see any obvious cause for this:
>> https://github.com/apache/beam/blob/b3aa2e89489898f8c760294ba4dba2310ac53e70/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamCalcRule.java#L69
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_beam_blob_b3aa2e89489898f8c760294ba4dba2310ac53e70_sdks_java_extensions_sql_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rule_BeamCalcRule.java-23L69=DwMFaQ=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g=KXc2qSceL6qFbFnQ_2qUOHr9mKuc6zYY8rJTNZC8p_wTcNs4M6mHQoCuoc4JfeaA=KjzplEf29oFB6uivvdjixpQiArWtfV-1SXpALL-ugEM=>
>>
>> I don't like to guess that upstream libraries have the bug, but in this
>> case I wonder if the alias is lost in the Calcite optimizer rule for
>> merging the projects and filters into a Calc.
>>
>> Kenn
>>
>> On Mon, Jan 23, 2023 at 10:13 AM Kenneth Knowles  wrote:
>>
>>> I am not sure I understand the question, but I do see an issue.
>>>
>>> Context: "CalcRel" is an optimized relational operation that is somewhat
>>> like ParDo, with a small snippet of a single-assignment DSL embedded in it.
>>> Calcite will choose to merge all the projects and filters into the node,
>>> and then generates Java bytecode to directly execute the DSL.
>>>
>>> Problem: it looks like the CalcRel has output columns with aliases "id"
>>> and "v" where it should have output columns with aliases "id" and "value".
>>>
>>> Kenn
>>>
>>> On Thu, Jan 19, 2023 at 6:01 PM Ahmet Altay  wrote:
>>>
>>>> Adding: @Andrew Pilloud  @Kenneth Knowles
>>>> 
>>>>
>>>> On Thu, Jan 12, 2023 at 12:31 PM Talat Uyarer via user <
>>>> u...@beam.apache.org> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am using Beam 2.43 with Calcite SQL with Java.
>>>>>
>>>>> I have a query with a WITH clause and some aliasing. Looks like Beam
>>>>> Query optimizer after optimizing my query, it drops Select statement's
>>>>> aliases. Can you help me to identify where the problem is ?
>>>>>
>>>>> This is my query
>>>>> INFO: SQL:
>>>>> WITH `tempTable` (`id`, `v`) AS (SELECT
>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedInt` AS `id`,
>>>>> `PCOLLECTION`.`f_nestedRow`.`f_nestedString` AS `v`
>

Re: [Proposal] Change to Default PubsubMessage Coder

2022-12-19 Thread Andrew Pilloud via dev

I think the Dataflow update concern is the biggest concern and as you point
out that can be easily overcome. I generally believe that users who aren't
setting the coder don't actually care as long as it works, so changing the
default across Beam versions seems relatively low risk. Another mitigating
factor is that both concerns require users to actually be using the coder
(likely via Reshuffle.viaRandomKey) rather than consuming the output in a
fused ParDo (which I think is what we would recommend).

As a counter proposal: is there something that stops us from using RowCoder
by default here? I assume all forms of "PubsubMessage" can be encoded with
RowCoder, it provides flexibility for future changes, and PubSub will be
able to take advantage of future work to make RowCoder more efficient. (If
we can't switch to RowCoder, that seems like a useful feature request for
RowCoder.)

Andrew

On Mon, Dec 19, 2022 at 3:37 PM Evan Galpin  wrote:

> Bump 
>
> Any other risks or drawbacks associated with altering the default coder
> for PubsubMessage to be the most inclusive coder with respect to possible
> fields?
>
> Thanks,
> Evan
>
>
> On Mon, Dec 12, 2022 at 10:06 AM Evan Galpin  wrote:
>
>> Hi folks,
>>
>> I'd like to solicit feedback on the notion of using
>> PubsubMessageWithAttributesAndMessageIdAndOrderingKeyCoder[1] as the
>> default coder for Pubsub messages instead of the current default of
>> PubsubMessageWithAttributesCoder.
>>
>> Not long ago, support for reading and writing Pubsub messages in Beam
>> including an OrderingKey was added[2].  Part of this change involved adding
>> a new Coder for PubsubMessage in order to capture and propagate the
>> orderingKey[1].  This change illuminated that in cases where the coder type
>> for PubsubMessage is inferred, it is possible to accidentally and silently
>> nullify fields like MessageId and OrderingKey in a way that is not at all
>> obvious to users[3].
>>
>> So far two potential drawbacks of this proposal have been identified:
>> 1. Update compatibility for pipelines using PubsubIO might require users
>> to explicitly specify the current default coder (
>> PubsubMessageWithAttributesCoder)
>> 2. Messages would require a larger number of bytes to store as compared
>> to the current default (which could again be overcome by users specifying
>> the current default coder)
>>
>> What other potential drawbacks might there be? I look forward to hearing
>> others' input!
>>
>> Thanks,
>> Evan
>>
>> [1]
>> https://github.com/apache/beam/pull/22216/files#diff-28243ab1f9eef144e45a9f6cb2e07fa1cf53c021ceaf733d92351254f38712fd
>> [2] https://github.com/apache/beam/pull/22216
>> [3] https://github.com/apache/beam/issues/23525
>>
>

Re: [Proposal] Adopt a Beam I/O Standard

2022-12-16 Thread Andrew Pilloud via dev

By "Relational" I mean things like: Column Pruning, Filter Pushdown, Table
Statistics, Partition Metadata, Metastore. We have a bunch of one-off
implementations in various IOs (mostly BigQueryIO) and have been waiting
for IO standards to push them out to all IOs. This was section "F5 -
Relational" from https://s.apache.org/beam-io-api-standard-documentation

On Thu, Dec 15, 2022 at 6:50 PM Herman Mak  wrote:

> Hey all,
>
> Firstly apologies for the confusion.
>
> The scope of this effort is to *finalize and have this added to the Beam
> public documentation* to be used as a PR reference once we have resolved
> the comments.
> YES this document is a continuation of the below docs with some additional
> components such as testing!
>
> The idea is to convert this to a MD file and add a page under "Developing
> new I/O connectors" with some small cleanup work around this area in other
> pages.
> [image: image.png]
>
>
>
>
> Docs that this is a continuation of:
> https://s.apache.org/beam-io-api-standard-documentation
> https://s.apache.org/beam-io-api-standard
>
>
> @Andrew Pilloud  Totally not intending to start from
> the beginning here, by relational do you mean having this hosting in the
> Beam confluence?
>
> Thanks all, and keep the feedback to the docs coming
>
> Herman Mak |  Customer Engineer, Hong Kong, Google Cloud |
> herman...@google.com |  +852-3923-5417 <+852%203923%205417>
>
>
>
>
>
> On Fri, Dec 16, 2022 at 1:36 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Thu, Dec 15, 2022, 8:33 AM Alexey Romanenko 
>> wrote:
>>
>>> Cham, do you remember what was a reason to not finalise that doc?
>>>
>>
>> I think this is a continuation of those docs (so we are trying to
>> finalize) but probably  Herman can explain better.
>>
>>
>>> Personally, I find having such standards very useful (if they are
>>> flexible during a time, of course), especially for new developers and PR
>>> reviewers, and it’d be great to finally have such doc as a part of
>>> contribution guide.
>>>
>>
>> +1
>>
>> Thanks,
>> Cham
>>
>>>
>>> —
>>> Alexey
>>>
>>> On 13 Dec 2022, at 04:32, Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>> Yeah, I don't think either finalized or documented (in the Website) the
>>> previous iteration. This doc seems to contain details from the documents
>>> shared in the previous iteration.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>>
>>> On Mon, Dec 12, 2022 at 6:49 PM Robert Burke  wrote:
>>>
>>>> I think ultimately: until the docs a clearly available on the Beam site
>>>> itself, it's not documentation. See also, design docs, previous emails, and
>>>> similar.
>>>>
>>>> On Mon, Dec 12, 2022, 6:07 PM Andrew Pilloud via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> I believe the previous iteration was here:
>>>>> https://lists.apache.org/thread/3o8glwkn70kqjrf6wm4dyf8bt27s52hk
>>>>>
>>>>> The associated docs are:
>>>>> https://s.apache.org/beam-io-api-standard-documentation
>>>>> https://s.apache.org/beam-io-api-standard
>>>>>
>>>>> This is missing all the relational stuff that was in those docs, this
>>>>> appears to be another attempt starting from the beginning?
>>>>>
>>>>> Andrew
>>>>>
>>>>>
>>>>> On Mon, Dec 12, 2022 at 9:57 AM Alexey Romanenko <
>>>>> aromanenko@gmail.com> wrote:
>>>>>
>>>>>> Thanks for writing this!
>>>>>>
>>>>>> IIRC, the similar design doc was sent for review here a while ago. Is
>>>>>> this just an updated version and a new one?
>>>>>>
>>>>>> —
>>>>>> Alexey
>>>>>>
>>>>>> On 11 Dec 2022, at 15:16, Herman Mak via dev 
>>>>>> wrote:
>>>>>>
>>>>>> Hello Everyone,
>>>>>>
>>>>>> *TLDR*
>>>>>>
>>>>>> Should we adopt a set of standards that Connector I/Os should adhere
>>>>>> to?
>>>>>> Attached is a first version of a Beam I/O Standards guideline that
>>>>>> includes opinionated best practices across important components

Re: [Proposal] Adopt a Beam I/O Standard

2022-12-12 Thread Andrew Pilloud via dev

I believe the previous iteration was here:
https://lists.apache.org/thread/3o8glwkn70kqjrf6wm4dyf8bt27s52hk

The associated docs are:
https://s.apache.org/beam-io-api-standard-documentation
https://s.apache.org/beam-io-api-standard

This is missing all the relational stuff that was in those docs, this
appears to be another attempt starting from the beginning?

Andrew


On Mon, Dec 12, 2022 at 9:57 AM Alexey Romanenko 
wrote:

> Thanks for writing this!
>
> IIRC, the similar design doc was sent for review here a while ago. Is this
> just an updated version and a new one?
>
> —
> Alexey
>
> On 11 Dec 2022, at 15:16, Herman Mak via dev  wrote:
>
> Hello Everyone,
>
> *TLDR*
>
> Should we adopt a set of standards that Connector I/Os should adhere to?
> Attached is a first version of a Beam I/O Standards guideline that
> includes opinionated best practices across important components of a
> Connector I/O, namely Documentation, Development and Testing.
>
> *The Long Version*
>
> Apache Beam is a unified open-source programming model for both batch and
> streaming. It runs on multiple platform runners and integrates with over 50
> services using individually developed I/O Connectors
> .
>
> Given that Apache Beam connectors are written by many different developers
> and at varying points in time, they vary in syntax style, documentation
> completeness and testing done. For a new adopter of Apache Beam, that can
> definitely cause some uncertainty.
>
> So should we adopt a set of standards that Connector I/Os should adhere
> to?
> Attached is a first version, in Doc format, of a Beam I/O Standards
> guideline that includes opinionated best practices across important
> components of a Connector I/O, namely Documentation, Development and
> Testing. And the aim is to incorporate this into the documentation and to
> have it referenced as standards for new Connector I/Os (and ideally have
> existing Connectors upgraded over time). If it looks helpful, the immediate
> next step is that we can convert it into a .md as a PR into the Beam repo!
>
> Thanks and looking forward to feedbacks and discussion,
>
>  [PUBLIC] Beam I/O Standards
> 
>
> Herman Mak |  Customer Engineer, Hong Kong, Google Cloud |
> herman...@google.com |  +852-3923-5417 <+852%203923%205417>
>
>
>
>

Re: Performance and Cost benchmarking

2022-09-26 Thread Andrew Pilloud via dev

Hi Pranav,

I left some comments on your design. Your doc discusses a bunch of
details about infrastructure such as testing frameworks, automation,
and performance databases, but doesn't describe how it will fit in
with our existing infrastructure (Load Tests, Nexmark, Jenkins,
InfluxDB, Grafina). I would suspect we actually have most of the
infrastructure already built?

What I didn't see (and expected to see) was details on how the tests
would actually interact with IOs. Will there be a generic Schema IO
test harness or do you plan to write one for each IO? Will you be
comparing different data types (data stored as byte[] vs more complex
structures)? What about different IO specific optimization (data
sharding, pushdown)?

Andrew

On Mon, Sep 26, 2022 at 9:07 AM Pranav Bhandari
 wrote:
>
> Hello,
>
> Hope this email finds you well. I have attached a link to a doc which 
> discusses the design for a performance and cost benchmarking framework to be 
> used by Beam IOs and Google-provided dataflow templates.
>
> Please feel free to comment on the doc with any questions, concerns or ideas 
> you might have.
>
> Thank you,
> Pranav Bhandari
>
>
> https://docs.google.com/document/d/14GatBilwuR4jJGb-ZNpYeuB-KkVmDvEm/edit?usp=sharing=102139643796739130048=true=true

Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-14 Thread Andrew Pilloud

We do have a dedicated machine for benchmarks. This is a single
machine limited to running one test at a time. Set the
jenkinsExecutorLabel for the job to 'beam-perf' to use it. For
example:
https://github.com/apache/beam/blob/66bbee84ed477d86008905646e68b100591b6f78/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Direct.groovy#L36

Andrew

On Wed, Sep 14, 2022 at 8:28 AM Alexey Romanenko
 wrote:
>
> I think it depends on the goal why to run that benchmarks. In ideal case, we 
> need to run them on the same dedicated machine(s) and with the same 
> configuration all the time but I’m not sure that it can be achieved in 
> current infrastructure reality.
>
> On the other hand, IIRC, the initial goal of benchmarks, like Nexmark, was to 
> detect fast any major regressions, especially between releases, that are not 
> so sensitive to ideal conditions. And here we a field for improvements.
>
> —
> Alexey
>
> On 13 Sep 2022, at 22:57, Kenneth Knowles  wrote:
>
> Good idea. I'm curious about our current benchmarks. Some of them run on 
> clusters, but I think some of them are running locally and just being noisy. 
> Perhaps this could improve that. (or if they are running on local Spark/Flink 
> then maybe the results are not really meaningful anyhow)
>
> On Tue, Sep 13, 2022 at 2:54 AM Moritz Mack  wrote:
>>
>> Hi team,
>>
>>
>>
>> I’m looking for some help to setup infrastructure to periodically run Java 
>> microbenchmarks (JMH).
>>
>> Results of these runs will be added to our community metrics (InfluxDB) to 
>> help us track performance, see [1].
>>
>>
>>
>> To prevent noisy runs this would require a dedicated Jenkins machine that 
>> runs at most one job (benchmark) at a time. Benchmark runs take quite some 
>> time, but on the other hand they don’t have to run very frequently (once a 
>> week should be fine initially).
>>
>>
>>
>> Thanks so much,
>>
>> Moritz
>>
>>
>>
>> [1] https://github.com/apache/beam/pull/23041
>>
>> As a recipient of an email from Talend, your contact personal data will be 
>> on our systems. Please see our privacy notice.
>>
>>
>>
>

[ANNOUNCE] Beam 2.31.0 Released

2021-07-09 Thread Andrew Pilloud

The Apache Beam team is pleased to announce the release of version 2.31.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing.
See https://beam.apache.org

You can download the release here:
https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed
on the Beam blog: https://beam.apache.org/blog/beam-2.31.0/

Thank you to everyone who contributed to this release, and we hope you
enjoy using Beam 2.31.0

-Andrew, on behalf of the Apache Beam community.

[RESULT][VOTE] Release 2.31.0, release candidate #1

2021-07-08 Thread Andrew Pilloud

I'm happy to announce that we have unanimously approved this release.

There are 5 approving votes, 4 of which are binding:
* Chamikara Jayalath
* Ahmet Altay
* Kenneth Knowles
* Robert Bradshaw

There are no disapproving votes.

Thanks everyone! I'll be pushing the artifacts today.

Re: [VOTE] Release 2.31.0, release candidate #1

2021-07-02 Thread Andrew Pilloud

Thanks for noticing that! The key was updated prior to signature. I need a
PMC member's help to copy the new keys to the release folder:

svn cp https://dist.apache.org/repos/dist/dev/beam/KEYS
https://dist.apache.org/repos/dist/
release/beam/KEYS

On Fri, Jul 2, 2021 at 9:27 AM Robert Bradshaw  wrote:

> The release artifacts are all signed with an expired key. Other than
> that it looks good.
>
> On Wed, Jun 30, 2021 at 6:48 PM Ahmet Altay  wrote:
> >
> > +1 (binding) I ran python quick starts with directrunner.
> >
> > Thank you!
> > Ahmet
> >
> > On Wed, Jun 30, 2021 at 4:48 PM Chamikara Jayalath 
> wrote:
> >>
> >> +1 (binding)
> >>
> >> Ran some Java and multi-language validations.
> >>
> >> Thanks,
> >> Cham
> >>
> >> On Tue, Jun 29, 2021 at 12:32 PM Andrew Pilloud 
> wrote:
> >>>
> >>> This is the Beam release, but Dataflow container images should be
> available now if you want to test with that.
> >>>
> >>> On Mon, Jun 28, 2021 at 7:32 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> >>>>
> >>>> Looks like Dataflow container images have not been released yet.
> >>>>
> >>>> On Mon, Jun 28, 2021 at 5:09 PM Andrew Pilloud 
> wrote:
> >>>>>
> >>>>> Hi everyone,
> >>>>> Please review and vote on the release candidate #1 for the version
> 2.31.0, as follows:
> >>>>> [ ] +1, Approve the release
> >>>>> [ ] -1, Do not approve the release (please provide specific comments)
> >>>>>
> >>>>>
> >>>>> Reviewers are encouraged to test their own use cases with the
> release candidate, and vote +1 if no issues are found.
> >>>>>
> >>>>> The complete staging area is available for your review, which
> includes:
> >>>>> * JIRA release notes [1],
> >>>>> * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> 9F8AE3D4 [3],
> >>>>> * all artifacts to be deployed to the Maven Central Repository [4],
> >>>>> * source code tag "v2.31.0-RC1" [5],
> >>>>> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [6].
> >>>>> * Java artifacts were built with Gradle 6.8.3 and OpenJDK/Oracle JDK
> 1.8.0_232.
> >>>>> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and to pypy [9].
> >>>>> * Validation sheet with a tab for 2.31.0 release to help with
> validation [10].
> >>>>> * Docker images published to Docker Hub [11].
> >>>>>
> >>>>> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
> >>>>>
> >>>>> For guidelines on how to try the release in your projects, check out
> our blog post at https://beam.apache.org/blog/validate-beam-release/.
> >>>>>
> >>>>> Thanks,
> >>>>> Release Manager
> >>>>>
> >>>>> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349991
> >>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.31.0/
> >>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >>>>> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1175/
> >>>>> [5] https://github.com/apache/beam/tree/v2.31.0-RC1
> >>>>> [6] https://github.com/apache/beam/pull/15068
> >>>>> [7] https://github.com/apache/beam-site/pull/615
> >>>>> [8] https://pypi.org/project/apache-beam/2.31.0rc1/
> >>>>> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1518714176
> >>>>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>

[VOTE] Release 2.31.0, release candidate #1

2021-06-28 Thread Andrew Pilloud

Hi everyone,
Please review and vote on the release candidate #1 for the version 2.31.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if no issues are found.

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 9F8AE3D4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.31.0-RC1" [5],
* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [6].
* Java artifacts were built with Gradle 6.8.3 and OpenJDK/Oracle JDK
1.8.0_232.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and to pypy [9].
* Validation sheet with a tab for 2.31.0 release to help with validation
[10].
* Docker images published to Docker Hub [11].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at https://beam.apache.org/blog/validate-beam-release/.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12349991
[2] https://dist.apache.org/repos/dist/dev/beam/2.31.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1175/
[5] https://github.com/apache/beam/tree/v2.31.0-RC1
[6] https://github.com/apache/beam/pull/15068
[7] https://github.com/apache/beam-site/pull/615
[8] https://pypi.org/project/apache-beam/2.31.0rc1/
[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1518714176
[10] https://hub.docker.com/search?q=apache%2Fbeam=image

Re: [PROPOSAL] Preparing for Beam 2.31.0 release

2021-06-23 Thread Andrew Pilloud

RC1 is mostly done, I'm just waiting on permissions to push the docker
images. I filed a JIRA with INFRA (
https://issues.apache.org/jira/browse/INFRA-22019) if anyone could help
please let me know.

Andrew

On Fri, Jun 18, 2021 at 1:33 PM Andrew Pilloud  wrote:

> We had a few more release blockers show up this week, they are now all
> resolved. I'll make another attempt at RC1 on Monday.
>
> On Mon, Jun 14, 2021 at 5:22 PM Andrew Pilloud 
> wrote:
>
>> All release blocking issues are now resolved. I will start building RC1
>> tomorrow.
>>
>> On Mon, Jun 7, 2021 at 2:11 PM Andrew Pilloud 
>> wrote:
>>
>>> The 2.31.0 release branch was cut based on the last commit on June 2nd
>>> and is now available for cherry-picks[1]. There are currently 3 open
>>> release blocking issues[2].
>>>
>>> Andrew
>>>
>>> [1] https://github.com/apache/beam/commits/release-2.31.0
>>> [2]
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20is%20EMPTY%20AND%20fixVersion%20%3D%202.31.0
>>>
>>> On Wed, May 26, 2021 at 11:12 AM Ahmet Altay  wrote:
>>>
>>>> +1. Thank you Andrew.
>>>>
>>>> On Thu, May 20, 2021 at 1:15 PM Andrew Pilloud 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Beam 2.31.0 release is scheduled to be cut in two weeks, on June 2
>>>>> according to the release calendar [1].
>>>>>
>>>>> I'd like to volunteer myself to be the release manager for
>>>>> this release. I plan on cutting the release branch on the scheduled date.
>>>>> That week starts with a US holiday and I will be on vacation the entire
>>>>> week so I don't expect to make any progress on the release until June 7th.
>>>>> If there is a desire I could delay the release cut by a few days.
>>>>>
>>>>> Any comments or objections?
>>>>>
>>>>> Andrew
>>>>>
>>>>> [1]
>>>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>>>>>
>>>>

Re: [PROPOSAL] Preparing for Beam 2.31.0 release

2021-06-18 Thread Andrew Pilloud

We had a few more release blockers show up this week, they are now all
resolved. I'll make another attempt at RC1 on Monday.

On Mon, Jun 14, 2021 at 5:22 PM Andrew Pilloud  wrote:

> All release blocking issues are now resolved. I will start building RC1
> tomorrow.
>
> On Mon, Jun 7, 2021 at 2:11 PM Andrew Pilloud  wrote:
>
>> The 2.31.0 release branch was cut based on the last commit on June 2nd
>> and is now available for cherry-picks[1]. There are currently 3 open
>> release blocking issues[2].
>>
>> Andrew
>>
>> [1] https://github.com/apache/beam/commits/release-2.31.0
>> [2]
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20is%20EMPTY%20AND%20fixVersion%20%3D%202.31.0
>>
>> On Wed, May 26, 2021 at 11:12 AM Ahmet Altay  wrote:
>>
>>> +1. Thank you Andrew.
>>>
>>> On Thu, May 20, 2021 at 1:15 PM Andrew Pilloud 
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Beam 2.31.0 release is scheduled to be cut in two weeks, on June 2
>>>> according to the release calendar [1].
>>>>
>>>> I'd like to volunteer myself to be the release manager for
>>>> this release. I plan on cutting the release branch on the scheduled date.
>>>> That week starts with a US holiday and I will be on vacation the entire
>>>> week so I don't expect to make any progress on the release until June 7th.
>>>> If there is a desire I could delay the release cut by a few days.
>>>>
>>>> Any comments or objections?
>>>>
>>>> Andrew
>>>>
>>>> [1]
>>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>>>>
>>>

Re: [VOTE] Release 2.30.0, release candidate #1

2021-06-17 Thread Andrew Pilloud

For the website I filed https://issues.apache.org/jira/browse/BEAM-12507
For the jars I filed https://issues.apache.org/jira/browse/BEAM-12508

Andrew

On Mon, Jun 14, 2021 at 5:16 PM Justin Mclean  wrote:

> Hi there,
>
> I'm not on your PMC but I took a look at your release and noticed
> something that you might want to look into and correct:
>
> There a couple of item include in the release that are not mentioned in
> your LICENSE file:
> ./beam-2.30.0/website/www/site/static/js/bootstrap.js
> ./beam-2.30.0/website/www/site/static/js/hero/lottie-light.min.js
>
> ./beam-2.30.0/website/www/site/static/fonts/bootstrap/glyphicons-halflings-regular.*
>
> Was the website meant to be include in the release?
>
> The release contains compiled jar files:
>  + ./beam-2.30.0/learning/katas/java/gradle/wrapper/gradle-wrapper.jar
>  + ./beam-2.30.0/learning/katas/kotlin/gradle/wrapper/gradle-wrapper.jar
>
> Kind Regards,
> Justin
>
>

Re: [PROPOSAL] Preparing for Beam 2.31.0 release

2021-06-14 Thread Andrew Pilloud

All release blocking issues are now resolved. I will start building RC1
tomorrow.

On Mon, Jun 7, 2021 at 2:11 PM Andrew Pilloud  wrote:

> The 2.31.0 release branch was cut based on the last commit on June 2nd and
> is now available for cherry-picks[1]. There are currently 3 open release
> blocking issues[2].
>
> Andrew
>
> [1] https://github.com/apache/beam/commits/release-2.31.0
> [2]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20is%20EMPTY%20AND%20fixVersion%20%3D%202.31.0
>
> On Wed, May 26, 2021 at 11:12 AM Ahmet Altay  wrote:
>
>> +1. Thank you Andrew.
>>
>> On Thu, May 20, 2021 at 1:15 PM Andrew Pilloud 
>> wrote:
>>
>>> Hi All,
>>>
>>> Beam 2.31.0 release is scheduled to be cut in two weeks, on June 2
>>> according to the release calendar [1].
>>>
>>> I'd like to volunteer myself to be the release manager for this release.
>>> I plan on cutting the release branch on the scheduled date. That week
>>> starts with a US holiday and I will be on vacation the entire week so I
>>> don't expect to make any progress on the release until June 7th. If there
>>> is a desire I could delay the release cut by a few days.
>>>
>>> Any comments or objections?
>>>
>>> Andrew
>>>
>>> [1]
>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>>>
>>

Re: [PROPOSAL] Preparing for Beam 2.31.0 release

2021-06-07 Thread Andrew Pilloud

The 2.31.0 release branch was cut based on the last commit on June 2nd and
is now available for cherry-picks[1]. There are currently 3 open release
blocking issues[2].

Andrew

[1] https://github.com/apache/beam/commits/release-2.31.0
[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20is%20EMPTY%20AND%20fixVersion%20%3D%202.31.0

On Wed, May 26, 2021 at 11:12 AM Ahmet Altay  wrote:

> +1. Thank you Andrew.
>
> On Thu, May 20, 2021 at 1:15 PM Andrew Pilloud 
> wrote:
>
>> Hi All,
>>
>> Beam 2.31.0 release is scheduled to be cut in two weeks, on June 2
>> according to the release calendar [1].
>>
>> I'd like to volunteer myself to be the release manager for this release.
>> I plan on cutting the release branch on the scheduled date. That week
>> starts with a US holiday and I will be on vacation the entire week so I
>> don't expect to make any progress on the release until June 7th. If there
>> is a desire I could delay the release cut by a few days.
>>
>> Any comments or objections?
>>
>> Andrew
>>
>> [1]
>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>>
>

[PROPOSAL] Preparing for Beam 2.31.0 release

2021-05-20 Thread Andrew Pilloud

Hi All,

Beam 2.31.0 release is scheduled to be cut in two weeks, on June 2
according to the release calendar [1].

I'd like to volunteer myself to be the release manager for this release. I
plan on cutting the release branch on the scheduled date. That week starts
with a US holiday and I will be on vacation the entire week so I don't
expect to make any progress on the release until June 7th. If there is a
desire I could delay the release cut by a few days.

Any comments or objections?

Andrew

[1]
https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles

Re: Performance tests dashboard not working

2021-04-21 Thread Andrew Pilloud

Looks like it is working now?

On Wed, Apr 21, 2021 at 7:34 AM Ismaël Mejía  wrote:

> Following the conversation on the performance regression on Flink
> runner I wanted to take a look at the performance dashboards (Nexmark
> + Load Tests) but when I open the dashboards it says there is a
> connectivity error "NetworkError when attempting to fetch resource.".
> Can someone with more knowledge about our CI / dashboards infra please
> take a look.
>
> http://104.154.241.245/d/ahudA_zGz/nexmark?orgId=1
>

Re: [VOTE] Release vendor-calcite-1_26_0 version 0.1, release candidate #1

2021-03-30 Thread Andrew Pilloud

Sorry for the delay, the vote has passed, with +5 approving votes, and +4
binding PMC votes:

 - Kai Jiang
 - Pablo Estrada
 -  Ahmet Altay
 - Jean-Baptiste Onofre
 - Kenneth Knowles

I will finalize the release today.

Andrew

On Wed, Mar 10, 2021 at 1:35 AM Ismaël Mejía  wrote:

> Thanks Brian I see that there are a lot of reasons to do the upgrade,
> maybe in the future it would be a good idea to explain the main
> motivations of the upgrade for vendored dependencies and refer
> relevant JIRAs
>
> On Tue, Mar 9, 2021 at 6:23 PM Brian Hulette  wrote:
> >
> > There are several jiras blocked by a Calcite upgrade. See
> https://issues.apache.org/jira/browse/BEAM-9379
> >
> > On Tue, Mar 9, 2021 at 5:17 AM Ismaël Mejía  wrote:
> >>
> >> Just out of curiosity is there some feature we are expecting from
> >> Calcite that pushes this upgrade or is this just catching up for the
> >> sake of security improvements + not having old dependencies?
> >>
> >>
> >> On Tue, Mar 9, 2021 at 12:23 AM Ahmet Altay  wrote:
> >> >
> >> > +1 (binding)
> >> >
> >> > On Mon, Mar 8, 2021 at 3:21 PM Pablo Estrada 
> wrote:
> >> >>
> >> >> +1 (binding) verified signature
> >> >>
> >> >> On Mon, Mar 8, 2021 at 12:05 PM Kai Jiang 
> wrote:
> >> >>>
> >> >>> +1 (non-binding)
> >> >>>
> >> >>> On Mon, Mar 8, 2021 at 11:37 AM Andrew Pilloud 
> wrote:
> >> >>>>
> >> >>>> Hi everyone,
> >> >>>> Please review and vote on the release candidate #1 for
> beam-vendor-calcite-1_26_0 version 0.1, as follows:
> >> >>>> [ ] +1, Approve the release
> >> >>>> [ ] -1, Do not approve the release (please provide specific
> comments)
> >> >>>>
> >> >>>>
> >> >>>> Reviewers are encouraged to test their own use cases with the
> release candidate, and vote +1 if no issues are found.
> >> >>>>
> >> >>>> The complete staging area is available for your review, which
> includes:
> >> >>>> * the official Apache source release to be deployed to
> dist.apache.org [1], which is signed with the key with fingerprint
> 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [2],
> >> >>>> * all artifacts to be deployed to the Maven Central Repository [3],
> >> >>>> * source code commit "0f52187e344dad9bca4c183fe51151b733b24e35"
> [4].
> >> >>>>
> >> >>>> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Andrew
> >> >>>>
> >> >>>> [1]
> https://dist.apache.org/repos/dist/dev/beam/vendor/beam-vendor-calcite-1_26_0/0.1/
> >> >>>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> >>>> [3]
> https://repository.apache.org/content/repositories/orgapachebeam-1163/
> >> >>>> [4]
> https://github.com/apache/beam/tree/0f52187e344dad9bca4c183fe51151b733b24e35
>

Re: [VOTE] New Committer: Tomo Suzuki

2021-03-17 Thread Andrew Pilloud

Wrong list...

On Wed, Mar 17, 2021 at 5:49 PM Robert Bradshaw  wrote:

> +1
>
> On Wed, Mar 17, 2021 at 5:48 PM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> Please vote on the proposal for Tomo Suzuki to become a committer in the
>> Apache Beam project, as follows:
>>
>> [ ] +1, Approve the proposal for the candidate to become a committer
>> [ ] -1, Disapprove the proposal for the candidate to become a committer
>>
>> The vote will be open for at least 72 hours excluding weekends. It is
>> adopted by majority approval, with at least 3 PMC affirmative votes.
>>
>> See discussion thread for more details [1].
>>
>> Kenn
>>
>> [1]
>> https://lists.apache.org/thread.html/r7e1c5ba7209b598bcd1bfce515744276be8d0eddacd7a0aa688f8595%40%3Cprivate.beam.apache.org%3E
>>
>

[VOTE] Release vendor-calcite-1_26_0 version 0.1, release candidate #1

2021-03-08 Thread Andrew Pilloud

Hi everyone,
Please review and vote on the release candidate #1 for
beam-vendor-calcite-1_26_0 version 0.1, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if no issues are found.

The complete staging area is available for your review, which includes:
* the official Apache source release to be deployed to dist.apache.org [1],
which is signed with the key with fingerprint
9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [2],
* all artifacts to be deployed to the Maven Central Repository [3],
* source code commit "0f52187e344dad9bca4c183fe51151b733b24e35" [4].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Andrew

[1]
https://dist.apache.org/repos/dist/dev/beam/vendor/beam-vendor-calcite-1_26_0/0.1/
[2] https://dist.apache.org/repos/dist/release/beam/KEYS
[3] https://repository.apache.org/content/repositories/orgapachebeam-1163/
[4]
https://github.com/apache/beam/tree/0f52187e344dad9bca4c183fe51151b733b24e35

Re: COVAR_POP aggregate function test for the ZetaSql dialec

2021-02-09 Thread Andrew Pilloud

Looks like the code that converts the parsed ZetaSQL to a Calcite logical
expression doesn't currently support aggregate functions with multiple
columns. See this TODO:
https://github.com/apache/beam/blob/3bb232fb098700de408f574585dfe74bbaff7230/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/AggregateScanConverter.java#L147

On Tue, Feb 9, 2021 at 10:22 AM Sonam Ramchand <
sonam.ramch...@venturedive.com> wrote:

> Hi Devs,
> I am trying to test the COVAR_POP aggregate function for the ZetaSql
> dialect. I see
> https://github.com/apache/beam/blob/b74fcf7b30d956fb42830d652a57b265a1546973/sdks/[…]he/beam/sdk/extensions/sql/impl/transform/agg/CovarianceFn.java
> 
>  is
> implemented as CombineFn and it works correctly
> https://github.com/apache/beam/blob/befcc3d780d561e81f23512742862a65c0ae3b69/sdks/[…]eam/sdk/extensions/sql/BeamSqlDslAggregationCovarianceTest.java
> 
> .However, for ZetaSql dialect, it throws:
> covar_pop has more than one argument.
> java.lang.IllegalArgumentException: covar_pop has more than one argument.Unit
> test:
>
> public void testZetaSqlCovarPop() {
>   String sql = "SELECT COVAR_POP(row_id,int64_col) FROM table_all_types  
> GROUP BY bool_col";
>
>   ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
>   BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
>   PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
>
>   final Schema schema = Schema.builder().addDoubleField("field1").build();
>   PAssert.that(stream)
>   .containsInAnyOrder(
>   Row.withSchema(schema).addValue(-1.0).build(),
>   Row.withSchema(schema).addValue(-1.6).build());
>
>   
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
> }
>
> Can anybody help me in understanding the cause of this problem? I do not
> understand how it works correctly in other places and not in ZetaSqlDialect.
> For reference: https://github.com/apache/beam/pull/13915
>
> I would really appreciate any sort of input on this.
> --
>
> Regards,
> *Sonam*
> Software Engineer
> Mobile: +92 3088337296 <+92%20308%208337296>
>
> 
>

Re: Nexmark ratelimiting not working

2021-01-07 Thread Andrew Pilloud

It looks like the 'isRateLimited' flag was used on these tests inside
google ~5 years ago, but I don't think we've ever used it in Beam. I don't
believe there are any tests so it doesn't really surprise me that it is
broken.

The configs we normally run nexmark on Flink with are here:
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_Java_Nexmark_Flink.groovy

On Thu, Jan 7, 2021 at 6:36 AM Teodor Spæren 
wrote:

> Hello and happy new years!
>
> I've been trying to use Nexmark for evaluating some performance
> improvements I've made to Beam. I've discovered a problem where Nexmark
> won't work if you enable the ratelimiting mode. I've filled out a bug
> report for this[1].
>
> To reproduce the problem is easy, simply run the normal nexmark suite
> with ratelimiting mode enable and see that the number of results you get
> is way lower than it should be.
>
> I've reached the limit of what I'm able to debug. Is this reproducible
> for anyone else and does anyone have a clue about why this might happen?
>
> Any help is very much appreciated!
>
> Teodor
>
> [1]: https://issues.apache.org/jira/browse/BEAM-11547
>

Re: Updating to Calcite 1.26 ... help wanted

2020-12-07 Thread Andrew Pilloud

To answer your initial question, the parser factory can be replaced here:
https://github.com/apache/beam/blob/7c43ab6a8df9b23caa7321fddff9a032a71908f6/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java#L98

The support from DDL in Beam is a copy of the calcite/server module (as
recommended by Calcite). It looks like this code path has been completely
rewritten in Calcite since 1.20. We copied files from
https://github.com/apache/calcite/tree/branch-1.20/server/src/main/java/org/apache/calcite/sql/ddl
To here:
https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser

Looks like all of that has been replaced with
https://github.com/apache/calcite/blob/master/server/src/main/java/org/apache/calcite/server/ServerDdlExecutor.java
in Calcite and Beam will need to be updated and the parser factory from
that file used.

Also of note:
BeamSqlParserImpl is generated from this source file:
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/codegen/config.fmpp
Which was copied from
https://github.com/apache/calcite/blob/master/server/src/main/codegen/config.fmpp

Thanks for working on this,

Andrew

On Sun, Dec 6, 2020 at 11:58 AM Niels Basjes  wrote:

> Hi,
>
> I'm trying to update the version of Calcite used in Beam to the latest
> version available.
> https://issues.apache.org/jira/browse/BEAM-9379
> https://github.com/apache/beam/pull/12962
>
> I've hit a bit of a roadblock that I'm asking your help for.
>
> What I ran into is that some tests in the current state of this pull
> request fail over a change in Calcite.
> I am specifically talking about
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/BeamSqlLineTest.java#L63
> and many other tests in this file that do a CREATE EXTERNAL TABLE or a DROP
> TABLE (i.e. do DDL stuff).
>
> I tracked the source of this exception back to Calcite where a few months
> ago a DdlExecutor was added and this default method for `DdlExecutor
> getDdlExecutor()` was added in Calcite:
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql/parser/SqlParserImplFactory.java#L58
>
> This default implementation simply returns a DdlExecutor (a new interface)
> that always fails with a UnsupportedOperationException if no valid
> implementation has been provided.
> This is marked as experimental and looks good to me so far.
>
> I found in the generated code of BeamSqlParserImpl this code (I shortened
> it a bit)
>
> public static final SqlParserImplFactory FACTORY = new
> SqlParserImplFactory() {
> public SqlAbstractParserImpl getParser(Reader reader) {
> final BeamSqlParserImpl parser = new BeamSqlParserImpl(reader);
> ...
> return parser;
> }
> };
>
> which is generated from a template in Calcite itself and (as far as I have
> been able to find so far) does not have a way of implementing a non-default
> method for DdlExecutor getDdlExecutor().
>
> How do we fix this?
> or ... can we fix this in Beam or is actually a change in Calcite needed?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: Unit tests vs. Integration Tests

2020-12-02 Thread Andrew Pilloud

We have a large number of tests that run pipelines on the Direct Runner or
various local runners, but don't require network access, so I don't think
the distinction is clear. I do agree that requiring a remote service falls
on the integration test side.

This seems like something that is easy to get wrong without some automation
to help. Could we run the :test targets on Jenkins using the sandbox
command or docker to block network access?

On Wed, Dec 2, 2020 at 3:38 PM Brian Hulette  wrote:

> Sorry I should've included the list of tests here. So far we've run into:
> DicomIOTest, FhirIOTest, HL7v2IOTest, org.apache.beam.sdk.extensions.ml.*IT
>
> Note the latter are called IT, but that package's build.gradle has a line
> to scoop ITs into the :test task (addressing in [1]).
>
> All of these tests are actually running pipelines so I think they'd be
> difficult to mock.
>
> [1] https://github.com/apache/beam/pull/13444
>
> On Wed, Dec 2, 2020 at 3:28 PM Kyle Weaver  wrote:
>
>> > Should we (do we) require unit tests to be hermetic?
>>
>> We should. Unit tests are hermetic by definition. That begs the
>> definition of hermetic, but clearly the internet is not.
>>
>> > Personally I think these tests should be classified as integration
>> tests (renamed to *IT, and run with the :integrationTest task)
>>
>> I'm not sure which tests you're talking about, but it may be better to
>> make them hermetic through mocking, depending on the intent of the test.
>>
>> On Wed, Dec 2, 2020 at 1:22 PM Brian Hulette  wrote:
>>
>>> I've been working with Niels Basjes on a standardized developer build
>>> environment that can run `./gradlew check` [1]. We've run into issues
>>> because several Java unit tests (class *Test, run with :test) are not
>>> hermetic. They fail unless the environment they're running in has access to
>>> the internet, and is authenticated to GCP with access to certain resources.
>>> Of course the former isn't typically a blocker, but the latter certainly
>>> can be.
>>>
>>> Personally I think these tests should be classified as integration tests
>>> (renamed to *IT, and run with the :integrationTest task), but I realized I
>>> don't know if we have a formal definition for what should be a unit test vs
>>> an integration test. Should we (do we) require unit tests to be hermetic?
>>>
>>> Brian
>>>
>>> [1] https://github.com/apache/beam/pull/13308
>>>
>>

Re: [BEAM-10587] Support Maps in BigQuery #12389

2020-10-09 Thread Andrew Pilloud

BigQuery has no native support for Map types, but I agree that we should be
consistent with how other tools import maps into BigQuery. Is this
something Dataflow templates do? What other tools are there?

Beam ZetaSQL also lacks support for Map types. I like the idea of adding a
configuration parameter to turn this on and retaining the existing behavior
by default.

Thanks for sending this to the list!

Andrew

On Fri, Oct 9, 2020 at 7:20 AM Jeff Klukas  wrote:

> It's definitely desirable to be able to get back Map types from BQ, and
> it's nice that BQ is consistent in representing maps as repeated key/value
> structs. Inferring maps from that specific structure is preferable to
> inventing some new naming convention for the fields, which would hinder
> interoperability with non-Beam applications.
>
> Would it be possible to add a configurable parameter called something like
> withMapsInferred() ? Default behavior would be the status quo, but users
> could opt in to the behavior of inferring maps based on field names. This
> would prevent the PR change from potentially breaking existing
> applications. And it means the least surprising behavior remains the
> default.
>
> On Fri, Oct 9, 2020 at 6:06 AM Worley, Ryan 
> wrote:
>
>> https://github.com/apache/beam/pull/12389
>>
>> Hi everyone, in the above pull request I am attempting to add support for
>> writing Avro records with maps to a BigQuery table (via Beam Schema).  The
>> write portion is fairly straightforward - we convert the map to an array of
>> structs with key and value fields (seemingly the closest possible
>> approximation of a map in BigQuery).  But the read back portion is more
>> controversial because we simply check if a field is an array of structs
>> with exactly two fields - key and value - and assume that should be read
>> into a Schema map field.
>>
>> So the possibility exists that an array of structs with key and value
>> fields, which wasn't originally written from a map, could be unexpectedly
>> read into a map.  In the PR review I suggested a few options for tagging
>> the BigQuery field, so that we could know it was written from a Beam Schema
>> map and should be read back into one, but I'm not very satisfied with any
>> of the options.
>>
>> Andrew Pilloud suggested that I write to this group to get some feedback
>> on the issue.  Should we be concerned that all arrays of structs with
>> exactly 'key' and 'value' fields would be read into a Schema map or could
>> this be considered a feature?  If the former, how would you suggest that we
>> limit reading into a map only those fields that were originally written
>> from a map?
>>
>> Thanks for any feedback to help bump this PR along!
>>
>> NOTICE:
>>
>> This message, and any attachments, contain(s) information that may be
>> confidential or protected by privilege from disclosure and is intended only
>> for the individual or entity named above. No one else may disclose, copy,
>> distribute or use the contents of this message for any purpose. Its
>> unauthorized use, dissemination or duplication is strictly prohibited and
>> may be unlawful. If you receive this message in error or you otherwise are
>> not an authorized recipient, please immediately delete the message and any
>> attachments and notify the sender.
>>
>

Re: Java Jenkins tests not running

2020-08-07 Thread Andrew Pilloud

Now it is going, just slow to start. There must be a backlog built up from
the outage.

On Fri, Aug 7, 2020 at 10:44 AM Andrew Pilloud  wrote:

> It appears Jenkins is not working again. It is not running on the PR to
> fix build breaks introduced while it was down:
> https://github.com/apache/beam/pull/12500
>
> On Fri, Aug 7, 2020 at 10:29 AM Luke Cwik  wrote:
>
>> I can confirm that it is working again.
>>
>> On Fri, Aug 7, 2020 at 4:43 AM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>>
>>> Hey,
>>>
>>> We knew about the issue. All Jenkins builds were not being triggered
>>> from the pull requests. There's a ticket for that if you're curious about
>>> details: INF <https://issues.apache.org/jira/browse/INFRA-20649>RA-20649
>>> <https://issues.apache.org/jira/browse/INFRA-20649>
>>> Seems that we've just fixed it. I've also retriggered the tests from
>>> your PR.
>>>
>>> Regards,
>>> Damian
>>>
>>>
>>> On Fri, Aug 7, 2020 at 9:11 AM Reuven Lax  wrote:
>>>
>>>> Does anyone know why Jenkins is not triggering any Java tests for
>>>> pr/12474? It is only triggering python tasks, which is odd considering that
>>>> this PR doesn't touch any python files.
>>>>
>>>> Reuven
>>>>
>>>

Re: Java Jenkins tests not running

2020-08-07 Thread Andrew Pilloud

It appears Jenkins is not working again. It is not running on the PR to fix
build breaks introduced while it was down:
https://github.com/apache/beam/pull/12500

On Fri, Aug 7, 2020 at 10:29 AM Luke Cwik  wrote:

> I can confirm that it is working again.
>
> On Fri, Aug 7, 2020 at 4:43 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Hey,
>>
>> We knew about the issue. All Jenkins builds were not being triggered from
>> the pull requests. There's a ticket for that if you're curious about
>> details: INF RA-20649
>> 
>> Seems that we've just fixed it. I've also retriggered the tests from your
>> PR.
>>
>> Regards,
>> Damian
>>
>>
>> On Fri, Aug 7, 2020 at 9:11 AM Reuven Lax  wrote:
>>
>>> Does anyone know why Jenkins is not triggering any Java tests for
>>> pr/12474? It is only triggering python tasks, which is odd considering that
>>> this PR doesn't touch any python files.
>>>
>>> Reuven
>>>
>>

Chronically flaky tests

2020-07-15 Thread Andrew Pilloud

We have two test suites that are responsible for a large percentage of our
flaky tests and  both have bugs open for about a year without being fixed.
These suites are ParDoLifecycleTest (BEAM-8101
) in Java
and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
, py2 BEAM-9232
, old duplicate BEAM-8197
).

Are there any volunteers to look into these issues? What can we do to
mitigate the flakiness until someone has time to investigate?

Andrew

Re: Canceling Jenkins builds when the update to PR makes prior build irrelevant

2020-06-23 Thread Andrew Pilloud

I believe we split _Commit and _Phrase to work around a bug with job
filtering. For example, when you make a python change only the python tests
are run based on the commit. We still want to be able to run the java jobs
by trigger phrase if needed. There are also performance tests (Nexmark for
example) that have different jobs to ensure PR runs don't end up published
in the performance dashboard, but i think those have a split of _Phrase and
_Cron.

As for canceling jobs, don't forget that the github status APIs are keyed
on commit hash and job name (not PR). It is possible for a commit to be on
multiple PRs and it is possible for a single PR to have multiple commits.
There are workflows that will be broken if you are keying off of a PR to
automatically cancel jobs.

On Tue, Jun 23, 2020 at 9:59 AM Tyson Hamilton  wrote:

> +1 the ability to cancel in-flight jobs is worth deduplicating _Phrase and
> _Commit. I don't see a benefit for having both.
>
> On Tue, Jun 23, 2020 at 9:02 AM Luke Cwik  wrote:
>
>> I think this is a great improvement to prevent the Jenkins queue from
>> growing too large and has been suggested in the past but we were unable to
>> do due to difficulty with the version of the ghrpb plugin that was used at
>> the time.
>>
>> I know that we created different variants of the tests because we wanted
>> to track metrics based upon whether something was a post commit (_Cron
>> suffix) vs precommits but don't know why we split _Phrase and _Commit.
>>
>> On Tue, Jun 23, 2020 at 3:35 AM Tobiasz Kędzierski <
>> tobiasz.kedzier...@polidea.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I was investigating the possibility of canceling Jenkins builds when the
>>> update to PR makes prior build irrelevant. (related to
>>> https://issues.apache.org/jira/browse/BEAM-3105)
>>> In the `GitHub Pull Request Builder Jenkins plugin [ghprb-plugin] there
>>> is a hidden option `Cancel build on update` that seems to work fine.
>>> e.g.
>>>
>>>1.
>>>
>>>I make a PR
>>>2.
>>>
>>>ghprb-plugin triggers  beam_PreCommit_PythonLint_Commit
>>>3.
>>>
>>>I make a new commit to the PR
>>>
>>>4.
>>>
>>>ghprb-plugin aborts the previous `beam_PreCommit_PythonLint_Commit`
>>>and adds to the queue the new one with updated sha1.
>>>
>>>
>>>
>>> This option seems to significantly improve the experience with build
>>> triggering and we are planning to enable it shortly.
>>>
>>> However, putting a phrase “Run PythonLint PreCommit” in the comment
>>> triggers new `beam_PreCommit_PythonLint_Phrase` build, but does not
>>> touch already queued or running `beam_PreCommit_PythonLint_Commit` builds,
>>> that are technically speaking, different jobs.
>>>
>>> For testing purposes I made a single job which was a “_Commit” job with
>>> added “Trigger phrase” and it works well (commit builds cancelled after
>>> putting phrase comment in PR)
>>>
>>> Hence my question: do we need separate “_Phrase” and “_Commit” jobs?
>>>
>>> BR
>>> Tobiasz
>>>
>>

Re: Question on NEXMark

2020-06-10 Thread Andrew Pilloud

I think the author of this test is long gone, but the code originated
inside google. This query is not part of the original Nexmark suite but was
designed to exercise corner cases caused by out of order events, so that is
what you are probably seeing. Here are relevant bits from the original
commit messages:

New query 11 to exercise session windows.

Q11 started as a basic session windows test
with out-of-order and delayed events.
This refines the trigger to limit the number
of events in sessions.

Andrew

On Wed, Jun 10, 2020 at 10:37 AM Sruthi S Kumar 
wrote:

> Hi,
>
> We are working on a Flink project and enhancing some state backend
> functionality. We are using NEXMark benchmark to compare different state
> backends performance of Flink. While running NEXMark queries using Flink
> runner of Beam we have noticed that there is quite a lot of non-existent
> read from the state-backed.
>
> For example, when running query 11 with RocksDB state-backed, we had
> around 368 successful reads while we had around 527 attempts to read
> non-existent reads. We are curious if that is intentional and if so what's
> the rationale behind it?
>
>
> --
> Regards,
>
> Sruthi
>

Re: Removing DATETIME in Java Schemas

2020-05-15 Thread Andrew Pilloud

My understanding is that the base type is effectively the wire format at
the type, where the logical type is the in-memory representation for Java.
For org.joda.time.Instant, this is just a wrapper around the underlying
Long. However for the Date logical type, the LocalDate type has struct as
the in-memory representation. There is a performance penalty to this
conversion and I think there are cases where users (SQL for example) would
want a fast path that skips the serialization/deserialization steps if
possible.

As for more complex logical types, I would urge caution in making
assumptions around the base types. I expect there will be base types of
strings containing JSON or byte arrays containing protobuf or avro. ZetaSQL
has a civil time module that packs a struct into a Long. All of these are
things you can write SQL around, but I think the users might be somewhat
surprised when they come out less than usable by default.

On Fri, May 15, 2020 at 2:12 PM Reuven Lax  wrote:

>
>
> On Fri, Apr 24, 2020 at 11:56 AM Brian Hulette 
> wrote:
>
>> When we created the portable representation of schemas last summer we
>> intentionally did not include DATETIME as a primitive type [1], even though
>> it already existed as a primitive type in Java [2]. There seemed to be
>> consensus around a couple of points: 1) At the very least DATETIME is a
>> poor name and it should be renamed to INSTANT or TIMESTAMP, and 2) a
>> logical type is better suited for this purpose.
>>
>> Since then, it's been my intention to replace Java's DATETIME with a
>> MillisInstant logical type backed by Long milliseconds since the epoch (see
>> BEAM-7554 [3]). I've finally managed to get a PR together that does so (and
>> keeps tests passing): https://github.com/apache/beam/pull/11456
>>
>> There could be a couple of concerns with this PR that I think would be
>> better discussed here rather than on github.
>>
>>
>> ## Breaking changes in low-level APIs
>> The PR includes some breaking changes in public, low-level schema APIs:
>> It removes DATETIME from the TypeName enum [4], and it will also change the
>> type returned by Row#getBaseValue for DATETIME/MillisInstant fields (Long
>> milliseconds since epoch rather than org.joda.time.Instant). Both of these
>> changes have the potential to break user code. That being said I think the
>> risk is justified for a few reasons:
>> - These are lower-level APIs that we don't intend for most users to use.
>> The preferred higher-level APIs (SQL, Schema transforms, and inferred
>> schemas for user types) should seamlessly transition over to the new
>> MillisInstant logical type.
>> - Schemas are an experimental feature.
>> - We can clearly document this breaking change in the release that
>> includes it.
>>
>
> I am a bit worried about this. Schemas have been marked experimental for.
> almost two years, and I've found that many Beam users are using them. If we
> make a breaking change, can we make sure that it is extremely clear to
> users how to fix their code? I'm afraid we are in the situation now where
> the code is theoretically experimental but practically is widely used.
>
>
>
>>
>>
>> ## Mixing joda and java 8 time
>> The NanosInstant logical type that Alex added uses java.time.Instant as
>> it's input type, while my MillisInstant type uses org.joda.time.Instant for
>> compatibility with the rest of Beam and the previous DATETIME primitive
>> type. It feels weird, but it also makes a certain sort of sense to use joda
>> time (which has millisecond precision) for MillisInstant, and java 8 time
>> (which has nanos) for NanosInstant. Also, the choice shouldn't have a
>> significant effect on end users - the schema inference code could generate
>> conversions between java 8 time and joda time (as we already do for
>> converting between various joda time types [5]) so user types can use
>> either one.
>>
>
> +1
>
>
>>
>>
>> ## Arbitrary Logical Types and SQL
>> Previously much of the SQL code was written to operate on the _base type_
>> value for any logical types. So for the new MillisInstant type, SQL would
>> attempt to operate on the underlying Long, rather than on a
>> org.joda.time.Instant instance. Thus when I switched over to a
>> MillisInstant logical type as the default for SQL date and time types any
>> tests that used them began failing with ClassCastExceptions.
>>
>> My solution was just to update SQL code to only ever reference the input
>> type (i.e. org.joda.time.Instant) for logical types. A good example of this
>> type of change is in BeamAggregationRel [6].
>>
>> My change does pass all of the SQL tests (including internal Google
>> tests), but I'm a little concerned that using the base type throughout SQL
>> was a conscious decision that I'm stomping on. In theory, it could ensure
>> that SqlTransform would be able to operate on user types that contain
>> custom user logical types, without requiring SqlTransform to understand
>> those types. This is just speculation though,

Re: [VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-02 Thread Andrew Pilloud

+1, Accept the donation of the Firefly design as Beam Mascot

On Thu, Apr 2, 2020 at 10:19 AM Julian Bruno  wrote:

> Hello Apache Beam Community,
>
> Please vote on the acceptance of the final design of the Firefly as Beam's
> mascot [1]. Please share your input no later than Monday, April 6, at noon
> Pacific Time.
>
> [ ] +1, Accept the donation of the Firefly design as Beam Mascot
>
> [ ] -1, Decline the donation of the Firefly design as Beam Mascot
>
> Vote is adopted by at least 3 PMC +1 approval votes, with no PMC -1
> disapproval
>
> votes. Non-PMC votes are still encouraged.
>
> PMC voters, please help by indicating your vote as "(binding)"
>
> The vote and input phase will be open until Monday, April 6, at 12 pm
> Pacific Time.
>
> Thank you very much for your feedback and ideas,
>
> Julian
>
> [1]
> https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing
>
>
> --
> Julian Bruno // Visual Artist & Graphic Designer
>  (510) 367-0551 / SF Bay Area, CA
> www.instagram.com/julbro.art
>
>

Re: ZetaSQL to Calcite translation layer

2020-03-26 Thread Andrew Pilloud

I think it makes sense for the ZetaSQL to Calcite translation layer to live
in Calcite itself, and did suggest it at one point on their dev list (See:
https://lists.apache.org/thread.html/38942fcb4775ed71f9b2ab8880ab68a4238166ea5e904111ca184a12%40%3Cdev.calcite.apache.org%3E).
I don't think there is a quick way to get there, but we could split up the
interfaces within Beam so they are cleaner.

It seems like a good next step would be to split up packages within Beam.
We could add a set of core SQL interfaces that only depend on Calcite and
then split our ZetaSQL translator into a piece that only depends on those
interfaces, Calcite, and ZetaSQL.

Andrew

On Thu, Mar 26, 2020 at 12:41 PM Steve Niemitz  wrote:

> The ZetaSQL to calcite translation layer that is bundled with beam seems
> generally useful in cases other than for beam.  In fact, we're using
> (essentially a fork of) it internally outside of beam right now (and I've
> fixed a bunch of bugs in it).
>
> Has there ever been any thought about splitting into a separate library
> without any beam dependencies?
>

Re: /zetasql/local_service/liblocal_service_jni.jnilib was not found inside JAR

2020-03-25 Thread Andrew Pilloud

This should be fixed in ZetaSQL 2020.03.2, which will be coming out soon.
We've verified it on multiple machines with OS X 10.14.6. If you have
another version, I could use help testing.

Check out this PR: https://github.com/apache/beam/pull/11223
Run: ./gradlew :sdks:java:extensions:sql:zetasql:check

Thanks!

Andrew

On Tue, Mar 17, 2020 at 2:50 AM Gleb Kanterov  wrote:

> There is a branch that builds ZetaSQL on Mac, it only works with Bazel
> 0.25.3. You would need XCode to build it locally. After you build it, and
> put jnilib into classpath, it just works. One of my colleagues has updated
> this branch to the latest release [1].
>
> /Gleb
>
> [1]: https://github.com/csenel/zetasql/tree/darwin-build
>
> On Mon, Mar 16, 2020 at 8:26 PM Tomo Suzuki  wrote:
>
>> I see. Thank you. I also found Alex's ticket for zetasql.
>> https://github.com/google/zetasql/issues/25
>>
>> Closing thread.
>>
>> Regards,
>> Tomo
>>
>> On Mon, Mar 16, 2020 at 3:21 PM Andrew Pilloud 
>> wrote:
>>
>>> That error is expected unless you've built your own jar with
>>> liblocal_service_jni.jnilib for OS X. A few have tried but no one has
>>> succeeded (as far as I know), it is on the ZetaSQL team's todo list. You'll
>>> need to run that module on Linux for now.
>>>
>>> See: https://github.com/google/zetasql/pull/3
>>>
>>> Andrew
>>>
>>> On Mon, Mar 16, 2020 at 12:09 PM Tomo Suzuki  wrote:
>>>
>>>> Hi Beam developers,
>>>>
>>>> I started getting test failures when building Beam in my MacBook Pro.
>>>> Module: sdks/java/extensions/sql/zetasql. The NoClassDefFoundError occurs
>>>> because of jnilib file is missing.
>>>>
>>>> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
>>>> File /zetasql/local_service/liblocal_service_jni.jnilib was not found
>>>> inside JAR.
>>>> at
>>>> com.google.zetasql.JniChannelProvider.(JniChannelProvider.java:68)
>>>> ... 69 more
>>>> Caused by: java.io.FileNotFoundException: File
>>>> /zetasql/local_service/liblocal_service_jni.jnilib was not found inside 
>>>> JAR.
>>>> at
>>>> com.google.zetasql.cz.adamh.utils.NativeUtils.loadLibraryFromJar(NativeUtils.java:105)
>>>> at
>>>> com.google.zetasql.JniChannelProvider.(JniChannelProvider.java:66)
>>>> ... 69 more
>>>>
>>>> Full log:
>>>> https://gist.github.com/suztomo/f3d8815e8f48aeabd0288de34c1488f0
>>>>
>>>> Has anyone encountered a similar problem?
>>>>
>>>> --
>>>> Regards,
>>>> Tomo
>>>>
>>>
>>
>> --
>> Regards,
>> Tomo
>>
>

Re: /zetasql/local_service/liblocal_service_jni.jnilib was not found inside JAR

2020-03-16 Thread Andrew Pilloud

That error is expected unless you've built your own jar with
liblocal_service_jni.jnilib for OS X. A few have tried but no one has
succeeded (as far as I know), it is on the ZetaSQL team's todo list. You'll
need to run that module on Linux for now.

See: https://github.com/google/zetasql/pull/3

Andrew

On Mon, Mar 16, 2020 at 12:09 PM Tomo Suzuki  wrote:

> Hi Beam developers,
>
> I started getting test failures when building Beam in my MacBook Pro.
> Module: sdks/java/extensions/sql/zetasql. The NoClassDefFoundError occurs
> because of jnilib file is missing.
>
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File
> /zetasql/local_service/liblocal_service_jni.jnilib was not found inside JAR.
> at
> com.google.zetasql.JniChannelProvider.(JniChannelProvider.java:68)
> ... 69 more
> Caused by: java.io.FileNotFoundException: File
> /zetasql/local_service/liblocal_service_jni.jnilib was not found inside JAR.
> at
> com.google.zetasql.cz.adamh.utils.NativeUtils.loadLibraryFromJar(NativeUtils.java:105)
> at
> com.google.zetasql.JniChannelProvider.(JniChannelProvider.java:66)
> ... 69 more
>
> Full log: https://gist.github.com/suztomo/f3d8815e8f48aeabd0288de34c1488f0
>
> Has anyone encountered a similar problem?
>
> --
> Regards,
> Tomo
>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Andrew Pilloud

For BigQueryIO, "CREATE EXTERNAL TABLE" does exactly what you describe in
"CREATE TABLE". You could add a table property to set the CreateDisposition
if you wanted to change that behavior.

Andrew

On Thu, Mar 5, 2020 at 11:10 AM Rui Wang  wrote:

> "CREATE TABLE" can be used to indicate if a table does not exist, BeamSQL
> will help create it in storage systems if allowed, while "CREATE EXTERNAL
> TABLE" can be used only for registering a table, no matter if the table
> exists or not. BeamSQL provides a finer-grained way to distinct
> different behaviours.
>
> In both cases BeamSQL does not store the table. Another approach is to
> leverage the options/table property to specify the expected behaviour.
>
>
> -Rui
>
> On Thu, Mar 5, 2020 at 10:55 AM Andrew Pilloud 
> wrote:
>
>> I'm not following the "CREATE TABLE" vs "CREATE EXTERNAL TABLE"
>> distinction. We added the "EXTERNAL" to make it clear that Beam wasn't
>> storing the table. Most of our current table providers will create the
>> underlying table as needed.
>>
>> Andrew
>>
>> On Thu, Mar 5, 2020 at 10:47 AM Rui Wang  wrote:
>>
>>> There are two pieces of news from the proposal:
>>> 1. Spanner source in SQL. (Welcome to contribute it)
>>> 2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
>>> whether assuming the table exists or not)
>>>
>>>
>>> There is a table property in the statement already that you can reuse to
>>> save your options.
>>>
>>>
>>> -Rui
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala 
>>> wrote:
>>>
>>>> Also auto creation is not there
>>>>
>>>> On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala 
>>>> wrote:
>>>>
>>>>> Proposal is to add more sources and also have time event time or
>>>>> processing enhancements further on them
>>>>>
>>>>> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
>>>>> wrote:
>>>>>
>>>>>> I believe we have this functionality alredy:
>>>>>> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>>>>>>
>>>>>> Existing GCP tables can also be loaded through the GCP datacatalog
>>>>>> metastore. What are you proposing that is new?
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>  We have been using Apache Beam extensively to process huge
>>>>>>> amounts of data, while beam is really powerful and can solve a huge 
>>>>>>> number
>>>>>>> of use cases. A Beam job's development and testing time is significantly
>>>>>>> high.
>>>>>>>
>>>>>>>This gap can be filled with Beam SQL, where a complete SQL based
>>>>>>> interface can reduce development and testing time to matter of minutes, 
>>>>>>> it
>>>>>>> also makes Apache Beam more user friendly where a wide variety of 
>>>>>>> audience
>>>>>>> with different analytical skillsets can interact.
>>>>>>>
>>>>>>> The current Beam SQL is still needs to be used programmatically, and
>>>>>>> so I propose the following additions/improvements.
>>>>>>>
>>>>>>> *Note: Whist the below given examples are more GCP biased, they
>>>>>>> apply to other sources in a generic manner*
>>>>>>>
>>>>>>> For Example: Imagine a user who wants to write a stream processing
>>>>>>> job on Google Cloud Dataflow. The user wants to process credit card
>>>>>>> transaction streams from Google Cloud PubSub (Something like Kafka) and
>>>>>>> enrich each record of the stream with some data that is stored in Google
>>>>>>> Cloud Spanner, after enrichment the user wishes to write the following 
>>>>>>> data
>>>>>>> to Google Cloud BigQuery.
>>>>>>>
>>>>>>> Given Below are the queries which the user should be ab

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Andrew Pilloud

I'm not following the "CREATE TABLE" vs "CREATE EXTERNAL TABLE"
distinction. We added the "EXTERNAL" to make it clear that Beam wasn't
storing the table. Most of our current table providers will create the
underlying table as needed.

Andrew

On Thu, Mar 5, 2020 at 10:47 AM Rui Wang  wrote:

> There are two pieces of news from the proposal:
> 1. Spanner source in SQL. (Welcome to contribute it)
> 2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
> whether assuming the table exists or not)
>
>
> There is a table property in the statement already that you can reuse to
> save your options.
>
>
> -Rui
>
>
>
>
>
>
>
> On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala  wrote:
>
>> Also auto creation is not there
>>
>> On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala 
>> wrote:
>>
>>> Proposal is to add more sources and also have time event time or
>>> processing enhancements further on them
>>>
>>> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
>>> wrote:
>>>
>>>> I believe we have this functionality alredy:
>>>> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>>>>
>>>> Existing GCP tables can also be loaded through the GCP datacatalog
>>>> metastore. What are you proposing that is new?
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>  We have been using Apache Beam extensively to process huge
>>>>> amounts of data, while beam is really powerful and can solve a huge number
>>>>> of use cases. A Beam job's development and testing time is significantly
>>>>> high.
>>>>>
>>>>>This gap can be filled with Beam SQL, where a complete SQL based
>>>>> interface can reduce development and testing time to matter of minutes, it
>>>>> also makes Apache Beam more user friendly where a wide variety of audience
>>>>> with different analytical skillsets can interact.
>>>>>
>>>>> The current Beam SQL is still needs to be used programmatically, and
>>>>> so I propose the following additions/improvements.
>>>>>
>>>>> *Note: Whist the below given examples are more GCP biased, they apply
>>>>> to other sources in a generic manner*
>>>>>
>>>>> For Example: Imagine a user who wants to write a stream processing job
>>>>> on Google Cloud Dataflow. The user wants to process credit card 
>>>>> transaction
>>>>> streams from Google Cloud PubSub (Something like Kafka) and enrich each
>>>>> record of the stream with some data that is stored in Google Cloud 
>>>>> Spanner,
>>>>> after enrichment the user wishes to write the following data to Google
>>>>> Cloud BigQuery.
>>>>>
>>>>> Given Below are the queries which the user should be able to fire on
>>>>> Beam and the rest should be automatically handled by the framework.
>>>>>
>>>>> //Infer schema from Spanner table upon table creation
>>>>>
>>>>> CREATE TABLE SPANNER_CARD_INFO
>>>>>
>>>>> OPTIONS (
>>>>>
>>>>>  ProjectId: “gcp-project”,
>>>>>
>>>>>  InstanceId : “spanner-instance-id”,
>>>>>
>>>>>  Database: “some-database”,
>>>>>
>>>>>  Table: “card_info”,
>>>>>
>>>>>  CloudResource: “SPANNER”,
>>>>>
>>>>> CreateTableIfNotExists: “FALSE”
>>>>>
>>>>>   )
>>>>>  //Apply schema to each record read from pubsub, and then apply SQL.
>>>>>
>>>>> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>>>>>
>>>>> OPTIONS (
>>>>>
>>>>> ProjectId: “gcp-project”,
>>>>>
>>>>> Topic: “card-transactions”,
>>>>>
>>>>> CloudResource : “PUBSUB”
>>>>>
>>>>> SubscriptionId : “subscriptionId-1”,
>>>>>
>>>>> CreateTopicIfNotExists: “FALSE”,
>>>>>
>>>>> CreateSubscriptionIfNotExist: “TRUE”,
>>>>>
>>>>> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>>>>>
>>>>> Js

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Andrew Pilloud

I believe we have this functionality alredy:
https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/

Existing GCP tables can also be loaded through the GCP datacatalog
metastore. What are you proposing that is new?

Andrew


On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala  wrote:

> Hi All,
>  We have been using Apache Beam extensively to process huge
> amounts of data, while beam is really powerful and can solve a huge number
> of use cases. A Beam job's development and testing time is significantly
> high.
>
>This gap can be filled with Beam SQL, where a complete SQL based
> interface can reduce development and testing time to matter of minutes, it
> also makes Apache Beam more user friendly where a wide variety of audience
> with different analytical skillsets can interact.
>
> The current Beam SQL is still needs to be used programmatically, and so I
> propose the following additions/improvements.
>
> *Note: Whist the below given examples are more GCP biased, they apply to
> other sources in a generic manner*
>
> For Example: Imagine a user who wants to write a stream processing job on
> Google Cloud Dataflow. The user wants to process credit card transaction
> streams from Google Cloud PubSub (Something like Kafka) and enrich each
> record of the stream with some data that is stored in Google Cloud Spanner,
> after enrichment the user wishes to write the following data to Google
> Cloud BigQuery.
>
> Given Below are the queries which the user should be able to fire on Beam
> and the rest should be automatically handled by the framework.
>
> //Infer schema from Spanner table upon table creation
>
> CREATE TABLE SPANNER_CARD_INFO
>
> OPTIONS (
>
>  ProjectId: “gcp-project”,
>
>  InstanceId : “spanner-instance-id”,
>
>  Database: “some-database”,
>
>  Table: “card_info”,
>
>  CloudResource: “SPANNER”,
>
> CreateTableIfNotExists: “FALSE”
>
>   )
>  //Apply schema to each record read from pubsub, and then apply SQL.
>
> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>
> OPTIONS (
>
> ProjectId: “gcp-project”,
>
> Topic: “card-transactions”,
>
> CloudResource : “PUBSUB”
>
> SubscriptionId : “subscriptionId-1”,
>
> CreateTopicIfNotExists: “FALSE”,
>
> CreateSubscriptionIfNotExist: “TRUE”,
>
> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>
> JsonRecordSchema : “{
>
> “CardNumber” : “INT”,
>
> “Amount”: “DOUBLE”,
>
> “eventTimeStamp” : “EVENT_TIME”
>
> }”)
>
> //Create table in BigQuery if not exists and insert
>
> CREATE TABLE TRANSACTION_HISTORY
>
> OPTIONS (
>
> ProjectId: “gcp-project”,
>
> CloudResource : “BIGQUERY”
>
> dataset: “dataset1”,
>
> table : “table1”,
>
> CreateTableIfNotExists: “TRUE”,
>
> TableSchema : “
>
> {
>
> “card_number” : “INT”,
>
> “first_name” : “STRING”,
>
> “last_name” : “STRING”,
>
> “phone” : “INT”,
>
> “city” : “STRING”,
>
> “amount”: “FLOAT”,
>
> “eventtimestamp” : “INT”,
>
> }”)
>
> //Actual query that should get stretched to a Beam dag
>
> INSERT INTO TRANSACTION_HISTORY
>
> SELECT
> pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
> FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner on
> (pubsub.card_number = spanner.card_number);
>
>
>
> Also to consider that if any of the sources or sinks change, we only
> change the SQL and done!.
>
> Please let me know your thoughts about this.
>
> Regards,
> Taher Koitawala
>
>

Re: Tests not triggering

2020-02-07 Thread Andrew Pilloud

I saw similar things yesterday. I reran the stuck/missing tests about an
hour ago with 'retest this please' and they worked.

Andrew

On Fri, Feb 7, 2020 at 9:25 AM Reuven Lax  wrote:

> Is Jenkins wedged again? I have PRs where the tests have been have been
> pending for over 10 hours.
>
> Reuven
>

Re: PreCommit Java Portability is failing

2020-01-23 Thread Andrew Pilloud

This is still hard failing on all Java PRs. Is there anything you can do to
mitigate this (disable the test) until it is fixed?

Andrew

On Tue, Jan 21, 2020 at 1:47 PM Hannah Jiang  wrote:

> This is caused by PR 10557 , I
> will fix it soon.
>
> On Tue, Jan 21, 2020 at 1:24 PM Kirill Kozlov 
> wrote:
>
>> Hello beam developers!
>>
>> Jenkins job page:
>> https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Cron/
>> The following tasks fails: `
>> *:runners:google-cloud-dataflow-java:buildAndPushDockerContainer*`
>> with error:
>> *'command 'docker'' finished with non-zero exit value 1*
>>
>> Is it possible that this task is failing due to recent INFRA changes?
>>
>> -
>> Kirill
>>
>

Re: Jenkins jobs not running for my PR 10438

2020-01-16 Thread Andrew Pilloud

done.

On Thu, Jan 16, 2020 at 1:07 PM Tomo Suzuki  wrote:

> Hi Beam committers,
>
> I appreciate if somebody can trigger precommit checks for
> https://github.com/apache/beam/pull/10614 with the following additional
> checks:
>
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>
>

Re: Jenkins jobs not running for my PR 10438

2020-01-15 Thread Andrew Pilloud

Done.

Infra shut our .adf.yaml file off for being too large. Updates are here:
https://issues.apache.org/jira/browse/INFRA-19670

On Wed, Jan 15, 2020 at 2:40 PM Tomo Suzuki  wrote:

> Hi Beam committers,
>
> Can somebody trigger the precommit cheeks for my new PR
> https://github.com/apache/beam/pull/10603 ?
>
> This PR still does not trigger the checks. I confirmed that my account
> is in the .adf.yaml.
>
> On Tue, Jan 14, 2020 at 9:48 PM Ahmet Altay  wrote:
> >
> > Done.
> >
> > +Kenneth Knowles, any updates from INFRA on this?
> >
> > On Tue, Jan 14, 2020 at 6:43 PM Tomo Suzuki  wrote:
> >>
> >> It hit Dataflow quota error again. Can somebody run
> >> Run Dataflow ValidatesRunner
> >> for https://github.com/apache/beam/pull/10554 ?
> >>
> >> On Tue, Jan 14, 2020 at 12:14 PM Tomo Suzuki 
> wrote:
> >> >
> >> > Valentyn, thank you.
> >> >
> >> > On Tue, Jan 14, 2020 at 12:05 PM Valentyn Tymofieiev
> >> >  wrote:
> >> > >
> >> > > Done. If tests still don't trigger, you could try to make a push to
> the branch to reset the test status.
> >> > >
> >> > > On Tue, Jan 14, 2020 at 8:38 AM Tomo Suzuki 
> wrote:
> >> > >>
> >> > >> Hi Beam developers,
> >> > >>
> >> > >> Can somebody run the following to
> https://github.com/apache/beam/pull/10554 ?
> >> > >> Run Dataflow ValidatesRunner
> >> > >> Run Java PreCommit
> >> > >>
> >> > >> On Mon, Jan 13, 2020 at 2:35 PM Tomo Suzuki 
> wrote:
> >> > >> >
> >> > >> > Thank you, Mark and Ismaël.
> >> > >> >
> >> > >> > On Mon, Jan 13, 2020 at 2:34 PM Mark Liu 
> wrote:
> >> > >> > >
> >> > >> > > done
> >> > >> > >
> >> > >> > > On Mon, Jan 13, 2020 at 8:03 AM Tomo Suzuki <
> suzt...@google.com> wrote:
> >> > >> > >>
> >> > >> > >> Thanks Yifan (but Java Precommit is still missing).
> >> > >> > >> Can somebody run "Run Java PreCommit" on
> >> > >> > >> https://github.com/apache/beam/pull/10554?
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> On Mon, Jan 13, 2020 at 2:59 AM Yifan Zou <
> yifan...@google.com> wrote:
> >> > >> > >> >
> >> > >> > >> > done.
> >> > >> > >> >
> >> > >> > >> > On Sun, Jan 12, 2020 at 6:27 PM Tomo Suzuki <
> suzt...@google.com> wrote:
> >> > >> > >> >>
> >> > >> > >> >> Hi Beam committers,
> >> > >> > >> >>
> >> > >> > >> >> Four Jenkins jobs did not report back for this PR
> >> > >> > >> >> https://github.com/apache/beam/pull/10554 .
> >> > >> > >> >> Can somebody trigger them?
> >> > >> > >> >>
> >> > >> > >> >> On Fri, Jan 10, 2020 at 4:51 PM Andrew Pilloud <
> apill...@google.com> wrote:
> >> > >> > >> >> >
> >> > >> > >> >> > Done.
> >> > >> > >> >> >
> >> > >> > >> >> > On Fri, Jan 10, 2020 at 12:59 PM Tomo Suzuki <
> suzt...@google.com> wrote:
> >> > >> > >> >> >>
> >> > >> > >> >> >> Hi Bean developers,
> >> > >> > >> >> >>
> >> > >> > >> >> >> I appreciate a committer can trigger precommit build for
> >> > >> > >> >> >> https://github.com/apache/beam/pull/10554.
> >> > >> > >> >> >>
> >> > >> > >> >> >> In addition to normal precommit checks, I want the
> followings:
> >> > >> > >> >> >> Run Java PostCommit
> >> > >> > >> >> >> Run Java HadoopFormatIO Performance Test
> >> > >> > >> >> >> Run BigQueryIO Streaming Performance Test Java
> >> > >> > >> >> >> Run Dataflow ValidatesRunner
> >> > >> > >> >> >> Run Spark ValidatesRunner
> >> > >> > >> >> >> Run SQL Postcommit
> >> > >> > >> >> >>
> >> > >> > >> >> >> Regards,
> >> > >> > >> >> >> Tomo
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >> --
> >> > >> > >> >> Regards,
> >> > >> > >> >> Tomo
> >> > >> > >>
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> --
> >> > >> > >> Regards,
> >> > >> > >> Tomo
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > --
> >> > >> > Regards,
> >> > >> > Tomo
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Regards,
> >> > >> Tomo
> >> >
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Tomo
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Tomo
>
>
>
> --
> Regards,
> Tomo
>

Re: Apache community contact point

2020-01-15 Thread Andrew Pilloud

I'm not sure you have the right contact point. Have you tried filing a JIRA
ticket with the INFRA project and Docker component? JIRA is usually the
best way to get changes made to Apache infrastructure.

Andrew

On Wed, Jan 15, 2020 at 1:03 PM Hannah Jiang  wrote:

> I am trying to contact the Apache community to deploy Beam images to their
> organization at docker hub. I wrote an email to *d...@community.apache.org
> * and it has been almost 48 hours, but haven't
> received any response.
>
> To the people who have experience working with them, is this a correct
> contact point? Are there any advice I can follow?
>
> Thanks,
> Hannah
>

Re: Is Jenkins stuck?

2020-01-14 Thread Andrew Pilloud

I've noticed the same thing. The instances I've investigated have actually
ran the tests, but the status never got posted to github.

On Tue, Jan 14, 2020 at 12:36 PM Boyuan Zhang  wrote:

> Same here: https://github.com/apache/beam/pull/10375, at least has been
> pended for 15 hrs
>
> On Tue, Jan 14, 2020 at 12:31 PM Reuven Lax  wrote:
>
>> For https://github.com/apache/beam/pull/10534 , all the tests have been
>> yellow "pending" for the past 15 hours. The same was true before when I
>> tried forcing then all to run again, but the new set of tests is also
>> stuck. Does anyone know what's happening?
>>
>> Reuven
>>
>

Re: Jenkins jobs not running for my PR 10438

2020-01-10 Thread Andrew Pilloud

Done.

On Fri, Jan 10, 2020 at 12:59 PM Tomo Suzuki  wrote:

> Hi Bean developers,
>
> I appreciate a committer can trigger precommit build for
> https://github.com/apache/beam/pull/10554.
>
> In addition to normal precommit checks, I want the followings:
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>

Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Andrew Pilloud

We need is an annotation checker to ensure every public method is
tagged either @Experimental or @Deprecated. That way there will be no
confusion about what we expect to be stable. If we really want to
offer stable APIs there exist many tools (such as JAPICC[1]) to ensure
we don't make breaking changes. Without some actual tests we are just
hoping we don't break anything, so everything is actually
@Experimental.

Andrew

[1] https://lvc.github.io/japi-compliance-checker/

On Wed, Nov 27, 2019 at 10:12 AM Kenneth Knowles  wrote:
>
> Hi all
>
> I wanted to start a dedicated thread to the discussion of how to manage our 
> @Experimental annotations, API evolution in general, etc.
>
> After some email back-and-forth this will get too big so then I will try to 
> summarize into a document. But I think a thread to start with makes sense.
>
> Problem statement:
>
> 1. Users need stable APIs so their software can just keep working
> 2. Breaking changes are necessary to achieve correctness / high quality
>
> Neither of these is actually universally true. Many changes don't really hurt 
> users, and some APIs are so obvious they don't require adjustment, or 
> continuing to use an inferior API is OK since at least correctness is 
> possible.
>
> But we have had to many breaking changes in Beam, some quite late, for the 
> purposes of fixing major data loss bugs, design errors, changes in underlying 
> services, and usability. [1] So I take for granted that we do need to make 
> these changes.
>
> So the problem becomes:
>
> 1. Users need to know *which* APIs are frozen, clearly and with enough buy-in 
> that changes don't surprise them
> 2. Useful APIs that are not technically frozen but never change will still 
> get usage and should "graduate"
>
> Current status:
>
>  - APIs (classes, methods, etc) can be marked "experimental" with annotations 
> in languages
>  - "experimental" features are shipped in the same jar with non-experimental 
> bits; sometimes it is just a couple methods or classes
>  - "experimental" APIs are supposed to allow breaking changes
>  - there is no particular process for removing "experimental" status
>  - we do go through "deprecation" process even for experimental things
>
> Downsides to this:
>
>  - tons of Beam has become very mature but still "experimental" so it isn't 
> really safe to make breaking changes
>  - users are not really alerted that well to when they are using unstable 
> pieces
>  - we don't have an easy way to determine the impact of any breaking changes
>  - we also don't have a clear policy or guidance around underlying 
> services/client libs making breaking changes (such as services rejecting 
> older clients)
>  - having something both "experimental" and "deprecated" is maybe confusing, 
> but also just deleting experimental stuff is not safe in the current state of 
> things
>
> Some proposals that I can think of people made:
>
>  - making experimental features opt-in only (for example by a separate dep or 
> a flag)
>  - putting a version next to any experimental annotation and force review at 
> that time (lots of objections to this, but noting it for completeness)
>  - reviews for graduating on a case-by-case basis, with dev@ thread and maybe 
> vote
>  - try to improve our ability to know usage of experimental features (really, 
> all features!)
>
> I will start with my own thoughts from here:
>
> *Opt-in*: This is a powerful idea that I think changes everything.
>- for an experimental new IO, a separate artifact; this way we can also 
> see downloads
>- for experimental code fragments, add checkState that the relevant 
> experiment is turned on via flags
>
> *Graduation*: Once things are opt-in, the drive to graduate them will be 
> stronger than it is today. I think vote is appropriate, with rationale 
> including usage and test coverage and stability, since it is a commitment by 
> the community to maintain the code, which constitutes most of the TCO of code.
>
> *Tracking*:
>  - We should know what experiments we have and how old they are.
>  - It means that just tagging methods and classes "@Experimental" doesn't 
> really work. I think that is probably a good thing. It is confusing to have 
> hundreds of tiny experiments. We can target larger-scale experiments.
>  - If we regularly poll on twitter or user@ about features then it might 
> become a source of OK signal, for things where we cannot look at download 
> stats.
>
> I think with these three approaches, the @Experimental annotation is actually 
> obsolete. We could still use it to drive some kind of annotation processor to 
> ensure "if there is @Experimental then there is a checkState" but I don't 
> have experience doing such things.
>
> Kenn
>
> [1] 
> https://lists.apache.org/thread.html/1bfe7aa55f8d77c4ddfde39595c9473b233edfcc3255ed38b3f85612@%3Cdev.beam.apache.org%3E

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-22 Thread Andrew Pilloud

[ ] Beaver
[ ] Hedgehog
[ ] Lemur
[ ] Owl
[X] Salmon
[X] Trout
[ ] Robot dinosaur
[X] Firefly
[ ] Cuttlefish
[ ] Dumbo Octopus
[ ] Angler fish

Re: [VOTE] Vendored Dependencies Release

2019-09-03 Thread Andrew Pilloud

+1

Inspected the jar it looked reasonable.

Andrew

On Tue, Sep 3, 2019 at 9:06 AM Rui Wang  wrote:

> Friendly ping.
>
>
> -Rui
>
> On Thu, Aug 29, 2019 at 9:50 AM Rui Wang  wrote:
>
>> Thanks Kai and Andrew. Now prgapachebeam-1083 is publicly exposed.
>>
>> I also found a useful link[1] to explain staging repos in Apache Nexus
>>
>>
>> [1]:
>> https://help.sonatype.com/repomanager2/staging-releases/managing-staging-repositories#ManagingStagingRepositories-ClosinganOpenRepository
>>
>> -Rui
>>
>> On Wed, Aug 28, 2019 at 9:19 PM Andrew Pilloud 
>> wrote:
>>
>>> You need to close the release for it to be published to the staging
>>> server. I can help if you still have questions.
>>>
>>> Andrew
>>>
>>> On Wed, Aug 28, 2019, 8:48 PM Rui Wang  wrote:
>>>
>>>> I can see prgapachebeam-1083 is in open status in staging repository. I
>>>> am not sure why it is not public exposed. I probably need some guidance on
>>>> it.
>>>>
>>>>
>>>> -Rui
>>>>
>>>> On Wed, Aug 28, 2019 at 3:50 PM Kai Jiang  wrote:
>>>>
>>>>> Hi Rui,
>>>>>
>>>>> For accessing artifacts [1] in Maven Central Repository, is this
>>>>> intent to be not public exposed?
>>>>>
>>>>> Best,
>>>>> Kai
>>>>>
>>>>> [1]
>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>>>>>
>>>>> On Wed, Aug 28, 2019 at 11:57 AM Kai Jiang  wrote:
>>>>>
>>>>>> +1 (non-binding)Thanks Rui!
>>>>>>
>>>>>> On Tue, Aug 27, 2019 at 10:46 PM Rui Wang  wrote:
>>>>>>
>>>>>>> Please review the release of the following artifacts that we vendor:
>>>>>>>
>>>>>>>  * beam-vendor-calcite-1_20_0
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Please review and vote on the release candidate #1 for the
>>>>>>> org.apache.beam:beam-vendor-calcite-1_20_0:0.1, as follows:
>>>>>>>
>>>>>>> [ ] +1, Approve the release
>>>>>>>
>>>>>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>>>>>
>>>>>>>
>>>>>>> The complete staging area is available for your review, which
>>>>>>> includes:
>>>>>>>
>>>>>>> * the official Apache source release to be deployed to
>>>>>>> dist.apache.org [1], which is signed with the key with fingerprint
>>>>>>> 0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],
>>>>>>>
>>>>>>> * all artifacts to be deployed to the Maven Central Repository [3],
>>>>>>>
>>>>>>> * commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],
>>>>>>>
>>>>>>> The vote will be open for at least 72 hours. It is adopted by
>>>>>>> majority approval, with at least 3 PMC affirmative votes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Rui
>>>>>>>
>>>>>>> [1]
>>>>>>> https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0
>>>>>>>
>>>>>>> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>>>>>
>>>>>>> [3]
>>>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>>>>>>>
>>>>>>> [4]
>>>>>>> https://github.com/apache/beam/commit/664e25019fc1977e7041e4b834e8d9628b912473
>>>>>>>
>>>>>>>

Re: [VOTE] Vendored Dependencies Release

2019-08-28 Thread Andrew Pilloud

You need to close the release for it to be published to the staging server.
I can help if you still have questions.

Andrew

On Wed, Aug 28, 2019, 8:48 PM Rui Wang  wrote:

> I can see prgapachebeam-1083 is in open status in staging repository. I am
> not sure why it is not public exposed. I probably need some guidance on it.
>
>
> -Rui
>
> On Wed, Aug 28, 2019 at 3:50 PM Kai Jiang  wrote:
>
>> Hi Rui,
>>
>> For accessing artifacts [1] in Maven Central Repository, is this intent
>> to be not public exposed?
>>
>> Best,
>> Kai
>>
>> [1]
>> https://repository.apache.org/content/repositories/orgapachebeam-1083/
>>
>> On Wed, Aug 28, 2019 at 11:57 AM Kai Jiang  wrote:
>>
>>> +1 (non-binding)Thanks Rui!
>>>
>>> On Tue, Aug 27, 2019 at 10:46 PM Rui Wang  wrote:
>>>
 Please review the release of the following artifacts that we vendor:

  * beam-vendor-calcite-1_20_0

 Hi everyone,

 Please review and vote on the release candidate #1 for the
 org.apache.beam:beam-vendor-calcite-1_20_0:0.1, as follows:

 [ ] +1, Approve the release

 [ ] -1, Do not approve the release (please provide specific comments)


 The complete staging area is available for your review, which includes:

 * the official Apache source release to be deployed to dist.apache.org
 [1], which is signed with the key with fingerprint
 0D7BE1A252DBCEE89F6491BBDFA64862B703F5C8 [2],

 * all artifacts to be deployed to the Maven Central Repository [3],

 * commit hash "664e25019fc1977e7041e4b834e8d9628b912473" [4],

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Thanks,

 Rui

 [1] https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0

 [2] https://dist.apache.org/repos/dist/release/beam/KEYS

 [3]
 https://repository.apache.org/content/repositories/orgapachebeam-1083/

 [4]
 https://github.com/apache/beam/commit/664e25019fc1977e7041e4b834e8d9628b912473

Re: Change of Behavior - JDBC Set Command

2019-07-05 Thread Andrew Pilloud

We have definitely tried to rework this a few times in the past. The
biggest problems is that some changes to pipeline options require multiple
values to change at once. For example, changing the runner might require
some options to be set and others reset before the options are valid.

I'll try to dig up the past threads when I get back from vacation on
Monday, but I think there were a few PRs trying to do this before.

Andrew

On Wed, Jul 3, 2019, 9:54 AM Alireza Samadian  wrote:

> Before this PR, the set command was using a map to store values and then
> it was using PipelineOptionsFactory#fromArgs to parse those values.
> Therefore, by using PipelieOptionsFactory#parseObjects, we keep the same
> value parsing behavior for the SET command as before. Using
> PipelineOptionsFactory for parsing the objects also has two more
> advantages: It will prevent code duplication for parsing objects, and
> PipelineOptionsFactory does some extra checks (For instance checks if the
> runner is a valid type of runner). Thus, I think parsing the values the
> same way as PipelieOptionsFactory#parseObjects will be a better option.
>
>
> On Tue, Jul 2, 2019 at 3:50 PM Lukasz Cwik  wrote:
>
>> I see, in the current PR it seems like we are trying to adopt the parsing
>> logic of PipelineOptions command line value parsing to all SQL usecases
>> since we are exposing the parseOption method to be used in the
>> PipelineOptionsReflectionSetter#setOption.
>>
>> I should have asked in my earlier e-mail whether we wanted string to
>> value parsing to match what we do inside of the PipelineOptionsFactory. If
>> no, then PipelineOptionsReflectionSetter#setOption should take an Object
>> type for value instead of String.
>>
>> On Tue, Jul 2, 2019 at 9:39 AM Anton Kedin  wrote:
>>
>>> The proposed API assumes you already have a property name and a value
>>> parsed somehow, and now want to update a field on a pre-existing options
>>> object with that value, so there is no assumption about parsing being the
>>> same or not. E.g. if you set a property called `runner` to a string value
>>> `DirectRunner`, it should behave the same way whether it came from command
>>> line args, SQL SET command, JDBC connection args, or anywhere else.
>>>
>>> That said, we parse SET command differently from command line args [1].
>>> We also parse the pipeline options from the connection args [2] that has a
>>> different syntax as well. I don't know whether we can easily deal with this
>>> aspect at this point (and whether we should), but if a value can get
>>> parsed, idea is that it should work the same way after that.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/codegen/includes/parserImpls.ftl#L307
>>> [2]
>>> https://github.com/apache/beam/blob/b2fd4e392ede19f03a48997252970b8bba8535f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java#L82
>>>
>>> On Fri, Jun 28, 2019 at 7:57 AM Lukasz Cwik  wrote:
>>>
 Do we want SQL argument parsing to always be 1-1 with how command line
 parsing is being done?
 Note that this is different from the JSON <-> PipelineOptions
 conversion.

 I can see why the wrapper makes sense, just want to make sure that the
 JDBC SET command aligns with what we are trying to expose.

 On Thu, Jun 27, 2019 at 9:17 PM Anton Kedin  wrote:

> I think we thought about this approach but decided to get rid of the
> map representation wherever we can while still supporting setting of the
> options by name.
>
> One of the lesser important downsides of keeping the map around is
> that we will need to do `fromArgs` at least twice.
>
> Another downside is that we will probably have to keep and maintain
> two representations of the pipeline options at the same time and have 
> extra
> validations and probably reconciliation logic.
>
> We need the map representation in the JDBC/command-line use case where
> it's the only way for a user to specify the options. A user runs a special
> SQL command which goes through normal parsing and execution logic. On top
> of that we have a case of mixed Java/SQL pipelines, where we already have
> an instance of PipelineOptions and don't need a user to set the options
> from within a query. Right now this is impossible for other reasons as
> well. But to support both JDBC and Java+SQL use cases we currently pass
> both a map and a PipelineOptions instance around. Which makes things
> confusing. We can probably reduce passing things around but I think we 
> will
> still need to keep both representations.
>
> Ideally, I think, mixed Java+SQL pipelines should be backed by that
> same JDBC logic as much as possible. So potentially we should allow users
> to set the pipeline options from within a complicated query even in
> SqlTransform in a Java pipeline. However setting an

Re: [VOTE] Release 2.13.0, release candidate #2

2019-06-03 Thread Andrew Pilloud

+1 Reviewed the Nexmark java and SQL perfkit graphs, no obvious regressions
over the previous release.

On Mon, Jun 3, 2019 at 1:15 PM Lukasz Cwik  wrote:

> Thanks for the clarification.
>
> On Mon, Jun 3, 2019 at 11:40 AM Ankur Goenka  wrote:
>
>> Yes, i meant i will close the voting at 5pm and start the release process.
>>
>> On Mon, Jun 3, 2019, 10:59 AM Lukasz Cwik  wrote:
>>
>>> Ankur, did you mean to say your going to close the vote today at 5pm?
>>> (and then complete the release afterwards)
>>>
>>> On Mon, Jun 3, 2019 at 10:54 AM Ankur Goenka  wrote:
>>>
 Thanks for validating and voting.

 We have 4 binding votes.
 I will complete the release today 5PM. Please raise any concerns before
 that.

 Thanks,
 Ankur

 On Mon, Jun 3, 2019 at 8:36 AM Lukasz Cwik  wrote:

> Since the gearpump issue has been ongoing since 2.10, I can't consider
> it a blocker for this release and am voting +1.
>
> On Mon, Jun 3, 2019 at 7:13 AM Jean-Baptiste Onofré 
> wrote:
>
>> +1 (binding)
>>
>> Quickly tested on beam-samples.
>>
>> Regards
>> JB
>>
>> On 31/05/2019 04:52, Ankur Goenka wrote:
>> > Hi everyone,
>> >
>> > Please review and vote on the release candidate #2 for the version
>> > 2.13.0, as follows:
>> >
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific
>> comments)
>> >
>> > The complete staging area is available for your review, which
>> includes:
>> > * JIRA release notes [1],
>> > * the official Apache source release to be deployed to
>> dist.apache.org
>> >  [2], which is signed with the key with
>> > fingerprint 6356C1A9F089B0FA3DE8753688934A6699985948 [3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag "v2.13.0-RC2" [5],
>> > * website pull request listing the release [6] and publishing the
>> API
>> > reference manual [7].
>> > * Python artifacts are deployed along with the source release to the
>> > dist.apache.org  [2].
>> > * Validation sheet with a tab for 2.13.0 release to help with
>> validation
>> > [8].
>> >
>> > The vote will be open for at least 72 hours. It is adopted by
>> majority
>> > approval, with at least 3 PMC affirmative votes.
>> >
>> > Thanks,
>> > Ankur
>> >
>> > [1]
>> >
>> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12345166
>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.13.0/
>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1070/
>> > [5] https://github.com/apache/beam/tree/v2.13.0-RC2
>> > [6] https://github.com/apache/beam/pull/8645
>> > [7] https://github.com/apache/beam-site/pull/589
>> > [8]
>> >
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1031196952
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Andrew Pilloud

I believe it is https://issues.apache.org/jira/browse/BEAM-6610

On Fri, May 24, 2019 at 11:58 AM Kenneth Knowles  wrote:

> Is there a jira tracking this?
>
> Kenn
>
> On Fri, May 24, 2019, 11:50 Andrew Pilloud  wrote:
>
>> This came up on the list in before in February:
>>
>> https://lists.apache.org/thread.html/38384d193e6f0af89f00a583e56cff93b18cfaebbf84e743eb900bc5@%3Cdev.beam.apache.org%3E
>>
>> We should be cleaning up topics, but it sounds like we aren't.
>>
>> Andrew
>>
>> On Fri, May 24, 2019 at 11:42 AM Pablo Estrada 
>> wrote:
>>
>>> I've found a bunch of topics created by PubSub integration tests - they
>>> dont seem to be getting cleaned up, perhaps?
>>>
>>> 614 name:
>>> projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSelectsPayloadContent-
>>> 614 name:
>>> projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-
>>> 222 name:
>>> projects/apache-beam-testing/topics/integ-test-PubsubReadIT-testReadPublicData-
>>>
>>

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Andrew Pilloud

This came up on the list in before in February:
https://lists.apache.org/thread.html/38384d193e6f0af89f00a583e56cff93b18cfaebbf84e743eb900bc5@%3Cdev.beam.apache.org%3E

We should be cleaning up topics, but it sounds like we aren't.

Andrew

On Fri, May 24, 2019 at 11:42 AM Pablo Estrada  wrote:

> I've found a bunch of topics created by PubSub integration tests - they
> dont seem to be getting cleaned up, perhaps?
>
> 614 name:
> projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSelectsPayloadContent-
> 614 name:
> projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-
> 222 name:
> projects/apache-beam-testing/topics/integ-test-PubsubReadIT-testReadPublicData-
>

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Andrew Pilloud

The specific issue in your text appears to be a typo introduced in
https://github.com/apache/beam/pull/8194 While that PR ran a bunch of
tests, I didn't see a reference to "Run Seed Job", which means it didn't
actually test the code in the change.

I expect not all the failures are the same as that typo only appears on one
job, but it seems problematic that we merged a major jenkins refactor
without any testing.

Andrew

*From: *Ankur Goenka 
*Date: *Tue, May 14, 2019 at 1:43 PM
*To: *dev

Hi,
>
> Following tests seems to be broken because of "Project 'unners' not found
> in root project 'beam'."
> The command getting executed on Jenkins is
> gradlew --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g
> -Dorg.gradle.jvmargs=-Xmx4g :unners:samza:validatesRunner
> causing the failure.
> The same is happening on the master
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_PR/19/console
>
> Are we making any changes to jenins parsing scripts?
>
> Details
> 
> [image: @asfgit] 
> Apache Flink Runner ValidatesRunner Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Apache Gearpump Runner ValidatesRunner Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Apache Samza Runner ValidatesRunner Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Apache Spark Runner ValidatesRunner Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Google Cloud Dataflow Runner PortabilityApi ValidatesRunner Tests —
> FAILURE
> Details
> 
> [image: @asfgit] 
> Google Cloud Dataflow Runner Python ValidatesContainer Tests — FAILURE
> Details 
> [image: @asfgit] 
> Google Cloud Dataflow Runner Python ValidatesRunner Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Google Cloud Dataflow Runner ValidatesRunner Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Java Flink PortableValidatesRunner Batch Tests — FAILURE
> Details
> 
> [image: @asfgit] 
> Java Flink PortableValidatesRunner Streaming Tests — FAILURE
> Details
> 
>
> Thanks,
> Ankur
>

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Andrew Pilloud

+1 for deleting code.

*From: *Ahmet Altay 
*Date: *Tue, May 14, 2019 at 2:22 PM
*To: *dev

+1
>
> *From: *Lukasz Cwik 
> *Date: *Tue, May 14, 2019 at 2:20 PM
> *To: *dev
>
> +1
>>
>> *From: *Daniel Oliveira 
>> *Date: *Tue, May 14, 2019 at 2:19 PM
>> *To: *dev
>>
>> Hello everyone,
>>>
>>> I'm calling for a vote on removing the deprecated Java Reference Runner
>>> code. The PR for the change has already been tested and reviewed:
>>> https://github.com/apache/beam/pull/8380
>>>
>>> [ ] +1, Approve merging the removal PR in it's current state
>>> [ ] -1, Veto the removal PR (please provide specific comments)
>>>
>>> The vote will be open for at least 72 hours. Since this a vote on
>>> code-modification, it is adopted if there are at least 3 PMC affirmative
>>> votes and no vetoes.
>>>
>>> For those who would like context on why the Java Reference Runner is
>>> being deprecated, the discussions took place in the following email threads:
>>>
>>>1. (8 Feb. 2019) Thoughts on a reference runner to invest in?
>>>
>>> 
>>>  -
>>>Decision to deprecate the Java Reference Runner and use the Python
>>>FnApiRunner for those use cases instead.
>>>2. (14 Mar. 2019) Python PVR Reference post-commit tests failing
>>>
>>> 
>>>- Removal of Reference Runner Post-Commits from Jenkins, and discussion 
>>> on
>>>removal of code.
>>>3. (25 Apr. 2019) Removing Java Reference Runner code
>>>
>>> 
>>>- Discussion thread before this formal vote.
>>>
>>>

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Andrew Pilloud

So it sounds like a number of the failures are related to a single jenkins
config for all branches. This means you can't test the release branch if
the targets change after it is cut. One possibility: do "Run Seed Job" on
the release branch and kick off all the tests right after that finishes.
Then do "Run Seed Job" after they all start on head again to restore the
config.

Andrew

*From: *Alan Myrvold 
*Date: *Tue, May 14, 2019 at 1:59 PM
*To: * 

That is a typo added in https://github.com/apache/beam/pull/8194
>
> https://github.com/apache/beam/commit/1e7ea0da5073566c3fa26dbc1105105fbe6043ae#diff-9591f0d06e82e711681fd77ed287578b
>
> *From: *Ankur Goenka 
> *Date: *Tue, May 14, 2019 at 1:43 PM
> *To: *dev
>
> Hi,
>>
>> Following tests seems to be broken because of "Project 'unners' not found
>> in root project 'beam'."
>> The command getting executed on Jenkins is
>> gradlew --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g
>> -Dorg.gradle.jvmargs=-Xmx4g :unners:samza:validatesRunner
>> causing the failure.
>> The same is happening on the master
>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_PR/19/console
>>
>> Are we making any changes to jenins parsing scripts?
>>
>> Details
>> 
>> [image: @asfgit] 
>> Apache Flink Runner ValidatesRunner Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Apache Gearpump Runner ValidatesRunner Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Apache Samza Runner ValidatesRunner Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Apache Spark Runner ValidatesRunner Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Google Cloud Dataflow Runner PortabilityApi ValidatesRunner Tests —
>> FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Google Cloud Dataflow Runner Python ValidatesContainer Tests — FAILURE
>> Details 
>> [image: @asfgit] 
>> Google Cloud Dataflow Runner Python ValidatesRunner Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Google Cloud Dataflow Runner ValidatesRunner Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Java Flink PortableValidatesRunner Batch Tests — FAILURE
>> Details
>> 
>> [image: @asfgit] 
>> Java Flink PortableValidatesRunner Streaming Tests — FAILURE
>> Details
>> 
>>
>> Thanks,
>> Ankur
>>
>

Re: SqlTransform Metadata

2019-05-14 Thread Andrew Pilloud

Hi Reza,

Where will this metadata be coming from? Beam SQL is tightly coupled with
the schema of the PCollection, so adding fields not in the data would be
difficult.

If what you want is the timestamp out of the DoFn.ProcessContext we might
be able to add a SQL function to fetch that.

Andrew

*From: *Reza Rokni 
*Date: *Tue, May 14, 2019, 1:08 AM
*To: * 

Hi,
>
> What are folks thoughts about adding something like
> SqlTransform.withMetadata().query(...)to enable users to be able to
> access things like Timestamp information from within the query without
> having to refiy the information into the element itself?
>
> Cheers
> Reza
>
>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-29 Thread Andrew Pilloud

Yes, they were moved and are now at
https://dist.apache.org/repos/dist/release/beam/2.12.0/

On Fri, Apr 26, 2019 at 2:02 AM Robert Bradshaw  wrote:
>
> Thanks for all the hard work!
>
> https://dist.apache.org/repos/dist/dev/beam/2.12.0/ seems empty; were
> the artifacts already moved?
>
> On Fri, Apr 26, 2019 at 10:31 AM Etienne Chauchot  
> wrote:
> >
> > Hi,
> > Thanks for all your work and patience Andrew !
> >
> > PS: as a side note, there were 5 binding votes (I voted +1)
> >
> > Etienne
> >
> > Le jeudi 25 avril 2019 à 11:16 -0700, Andrew Pilloud a écrit :
> >
> > I reran the Nexmark tests, each runner passed. I compared the numbers
> >
> > on the direct runner to the dashboard and they are where they should
> >
> > be.
> >
> >
> > With that, I'm happy to announce that we have unanimously approved this 
> > release.
> >
> >
> > There are 8 approving votes, 4 of which are binding:
> >
> > * Jean-Baptiste Onofré
> >
> > * Lukasz Cwik
> >
> > * Maximilian Michels
> >
> > * Ahmet Altay
> >
> >
> > There are no disapproving votes.
> >
> >
> > Thanks everyone!
> >
> >

[ANNOUNCE] Apache Beam 2.12.0 released!

2019-04-25 Thread Andrew Pilloud

The Apache Beam team is pleased to announce the release of version 2.12.0!

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bugfixes, features, and improvements detailed on
the Beam blog: https://beam.apache.org/blog/2019/04/25/beam-2.12.0.html

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.12.0.

-- Andrew Pilloud, on behalf of The Apache Beam team

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-25 Thread Andrew Pilloud

I reran the Nexmark tests, each runner passed. I compared the numbers
on the direct runner to the dashboard and they are where they should
be.

With that, I'm happy to announce that we have unanimously approved this release.

There are 8 approving votes, 4 of which are binding:
* Jean-Baptiste Onofré
* Lukasz Cwik
* Maximilian Michels
* Ahmet Altay

There are no disapproving votes.

Thanks everyone!

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-23 Thread Andrew Pilloud

It looks like Java Nexmark tests are on the validation sheet but we've
missed it the last few releases. Thanks for checking it Etienne! Does the
current release process require everything to be tested before making the
release final?

I fully agree with you on point 2. All of these issues were in RC1 and
could have been fixed for RC2.

Andrew

On Tue, Apr 23, 2019 at 2:58 PM Ahmet Altay  wrote:

> Thank you Andrew. I will suggest two improvements to the release process:
> 1. We can include benchmarks in the validation sheet ("Apache Beam Release
> Acceptance Criteria"). They are used part of the validation process and we
> can ensure that we check those for each release.
> 2. For RC validation, we can continue to exhaustively validate each RC
> even after the first -1 vote. Otherwise we end up with not discovering all
> issues in a given RC and find them a  successive RC, increasing the number
> of iterations required.
>
>
> On Tue, Apr 23, 2019 at 2:11 PM Andrew Pilloud 
> wrote:
>
>> Please consider the vote for RC4 canceled. I'll quickly follow up with a
>> new RC.
>>
>> Thanks for the complete testing everyone!
>>
>> Andrew
>>
>> On Tue, Apr 23, 2019 at 2:06 PM Reuven Lax  wrote:
>>
>>> -1
>>>
>>> we need to cherry pick pr/8325 and pr/8385 to fix the above issue
>>>
>>> On Tue, Apr 23, 2019 at 1:48 PM Andrew Pilloud 
>>> wrote:
>>>
>>>> I believe the breakage of Nexmark on Dataflow is
>>>> https://issues.apache.org/jira/browse/BEAM-7002, which went in before
>>>> the release was cut. It looks like this might be a release blocker based on
>>>> the fix: https://github.com/apache/beam/pull/8325.
>>>>
>>>> The degraded performance is after the release is cut, so we should be
>>>> good there.
>>>>
>>>> Andrew
>>>>
>>>> On Tue, Apr 23, 2019 at 8:44 AM Ismaël Mejía  wrote:
>>>>
>>>>> Etienne RC1 vote happened in 04/03 and there have not been any cherry
>>>>> picks on the spark runner afterwards so if there is a commit that
>>>>> degraded performance around 04/10 it is not part of the release we are
>>>>> voting, so please consider reverting your -1.
>>>>>
>>>>> However the issue you are reporting looks important, from a quick look
>>>>> I am guessing it could be related to BEAM-5775 that was merged on
>>>>> 12/04 however the performance regressions started happening since
>>>>> 09/04 so it could be unrelated. Maybe it could be due to changes in
>>>>> our infrastructure. Maybe the change in the workers to be tracked, but
>>>>> definitely not a release blocker at least for the Spark runner.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 23, 2019 at 5:12 PM Etienne Chauchot 
>>>>> wrote:
>>>>> >
>>>>> > Hi guys ,
>>>>> > I will vote -1 (binding) on this RC (although degradation is before
>>>>> RC4 cut date). I took a look at Nexmark graphs for the 3 major runners :
>>>>> > - there seem to have functional regressions on Dataflow:
>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5647201107705856
>>>>> . 13 queries fail in batch mode starting on 04/17
>>>>> > - there is a perf degradation (+200%) in spark runner starting on
>>>>> 04/10 for all the queries:
>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712
>>>>> >
>>>>> > Sorry Andrew for the added work
>>>>> >
>>>>> > Etienne
>>>>> >
>>>>> > Le lundi 22 avril 2019 à 12:21 -0700, Andrew Pilloud a écrit :
>>>>> >
>>>>> > I signed the wheels files and updated the build process to not
>>>>> require giving travis apache credentials. (You should probably change your
>>>>> password if you haven't already.)
>>>>> >
>>>>> > Andrew
>>>>> >
>>>>> > On Mon, Apr 22, 2019 at 12:18 PM Ahmet Altay 
>>>>> wrote:
>>>>> >
>>>>> > +1 (binding)
>>>>> >
>>>>> > Verified the python 2 wheel files with quick start examples.
>>>>> >
>>>>> > On Mon, Apr 22, 2019 at 11:26 AM Ahmet Altay 
>>>>> wrote:
>>>>> >
>>&g

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-23 Thread Andrew Pilloud

Please consider the vote for RC4 canceled. I'll quickly follow up with a
new RC.

Thanks for the complete testing everyone!

Andrew

On Tue, Apr 23, 2019 at 2:06 PM Reuven Lax  wrote:

> -1
>
> we need to cherry pick pr/8325 and pr/8385 to fix the above issue
>
> On Tue, Apr 23, 2019 at 1:48 PM Andrew Pilloud 
> wrote:
>
>> I believe the breakage of Nexmark on Dataflow is
>> https://issues.apache.org/jira/browse/BEAM-7002, which went in before
>> the release was cut. It looks like this might be a release blocker based on
>> the fix: https://github.com/apache/beam/pull/8325.
>>
>> The degraded performance is after the release is cut, so we should be
>> good there.
>>
>> Andrew
>>
>> On Tue, Apr 23, 2019 at 8:44 AM Ismaël Mejía  wrote:
>>
>>> Etienne RC1 vote happened in 04/03 and there have not been any cherry
>>> picks on the spark runner afterwards so if there is a commit that
>>> degraded performance around 04/10 it is not part of the release we are
>>> voting, so please consider reverting your -1.
>>>
>>> However the issue you are reporting looks important, from a quick look
>>> I am guessing it could be related to BEAM-5775 that was merged on
>>> 12/04 however the performance regressions started happening since
>>> 09/04 so it could be unrelated. Maybe it could be due to changes in
>>> our infrastructure. Maybe the change in the workers to be tracked, but
>>> definitely not a release blocker at least for the Spark runner.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Apr 23, 2019 at 5:12 PM Etienne Chauchot 
>>> wrote:
>>> >
>>> > Hi guys ,
>>> > I will vote -1 (binding) on this RC (although degradation is before
>>> RC4 cut date). I took a look at Nexmark graphs for the 3 major runners :
>>> > - there seem to have functional regressions on Dataflow:
>>> https://apache-beam-testing.appspot.com/explore?dashboard=5647201107705856
>>> . 13 queries fail in batch mode starting on 04/17
>>> > - there is a perf degradation (+200%) in spark runner starting on
>>> 04/10 for all the queries:
>>> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712
>>> >
>>> > Sorry Andrew for the added work
>>> >
>>> > Etienne
>>> >
>>> > Le lundi 22 avril 2019 à 12:21 -0700, Andrew Pilloud a écrit :
>>> >
>>> > I signed the wheels files and updated the build process to not require
>>> giving travis apache credentials. (You should probably change your password
>>> if you haven't already.)
>>> >
>>> > Andrew
>>> >
>>> > On Mon, Apr 22, 2019 at 12:18 PM Ahmet Altay  wrote:
>>> >
>>> > +1 (binding)
>>> >
>>> > Verified the python 2 wheel files with quick start examples.
>>> >
>>> > On Mon, Apr 22, 2019 at 11:26 AM Ahmet Altay  wrote:
>>> >
>>> > I built the wheel files. They are in the usual place along with other
>>> python artifacts. I will test them a bit and update here. Could someone
>>> else please try the wheel files as well?
>>> >
>>> > Andrew, could you sign and hash the wheel files?
>>> >
>>> > On Mon, Apr 22, 2019 at 10:11 AM Ahmet Altay  wrote:
>>> >
>>> > I verified
>>> > - signatures and hashes.
>>> >  - python streaming quickstart guide
>>> >
>>> > I would like to verify the wheel files before voting. Please let us
>>> know when they are ready. Also, if you need help with building wheel files
>>> I can help/build.
>>> >
>>> > Ahmet
>>> >
>>> > On Mon, Apr 22, 2019 at 3:33 AM Maximilian Michels 
>>> wrote:
>>> >
>>> > +1 (binding)
>>> >
>>> > Found a minor bug while testing, but not a blocker:
>>> > https://jira.apache.org/jira/browse/BEAM-7128
>>> >
>>> > Thanks,
>>> > Max
>>> >
>>> > On 20.04.19 23:02, Pablo Estrada wrote:
>>> > > +1
>>> > > Ran SQL postcommit, and Dataflow Portability Java validatesrunner
>>> tests.
>>> > >
>>> > > -P.
>>> > >
>>> > > On Wed, Apr 17, 2019 at 1:38 AM Jean-Baptiste Onofré <
>>> j...@nanthrax.net
>>> > > <mailto:j...@nanthrax.net>> wrote:
>>> > >
>>> > > +1 (binding)
&g

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-23 Thread Andrew Pilloud

I believe the breakage of Nexmark on Dataflow is
https://issues.apache.org/jira/browse/BEAM-7002, which went in before the
release was cut. It looks like this might be a release blocker based on the
fix: https://github.com/apache/beam/pull/8325.

The degraded performance is after the release is cut, so we should be good
there.

Andrew

On Tue, Apr 23, 2019 at 8:44 AM Ismaël Mejía  wrote:

> Etienne RC1 vote happened in 04/03 and there have not been any cherry
> picks on the spark runner afterwards so if there is a commit that
> degraded performance around 04/10 it is not part of the release we are
> voting, so please consider reverting your -1.
>
> However the issue you are reporting looks important, from a quick look
> I am guessing it could be related to BEAM-5775 that was merged on
> 12/04 however the performance regressions started happening since
> 09/04 so it could be unrelated. Maybe it could be due to changes in
> our infrastructure. Maybe the change in the workers to be tracked, but
> definitely not a release blocker at least for the Spark runner.
>
>
>
>
>
> On Tue, Apr 23, 2019 at 5:12 PM Etienne Chauchot 
> wrote:
> >
> > Hi guys ,
> > I will vote -1 (binding) on this RC (although degradation is before RC4
> cut date). I took a look at Nexmark graphs for the 3 major runners :
> > - there seem to have functional regressions on Dataflow:
> https://apache-beam-testing.appspot.com/explore?dashboard=5647201107705856
> . 13 queries fail in batch mode starting on 04/17
> > - there is a perf degradation (+200%) in spark runner starting on 04/10
> for all the queries:
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712
> >
> > Sorry Andrew for the added work
> >
> > Etienne
> >
> > Le lundi 22 avril 2019 à 12:21 -0700, Andrew Pilloud a écrit :
> >
> > I signed the wheels files and updated the build process to not require
> giving travis apache credentials. (You should probably change your password
> if you haven't already.)
> >
> > Andrew
> >
> > On Mon, Apr 22, 2019 at 12:18 PM Ahmet Altay  wrote:
> >
> > +1 (binding)
> >
> > Verified the python 2 wheel files with quick start examples.
> >
> > On Mon, Apr 22, 2019 at 11:26 AM Ahmet Altay  wrote:
> >
> > I built the wheel files. They are in the usual place along with other
> python artifacts. I will test them a bit and update here. Could someone
> else please try the wheel files as well?
> >
> > Andrew, could you sign and hash the wheel files?
> >
> > On Mon, Apr 22, 2019 at 10:11 AM Ahmet Altay  wrote:
> >
> > I verified
> > - signatures and hashes.
> >  - python streaming quickstart guide
> >
> > I would like to verify the wheel files before voting. Please let us know
> when they are ready. Also, if you need help with building wheel files I can
> help/build.
> >
> > Ahmet
> >
> > On Mon, Apr 22, 2019 at 3:33 AM Maximilian Michels 
> wrote:
> >
> > +1 (binding)
> >
> > Found a minor bug while testing, but not a blocker:
> > https://jira.apache.org/jira/browse/BEAM-7128
> >
> > Thanks,
> > Max
> >
> > On 20.04.19 23:02, Pablo Estrada wrote:
> > > +1
> > > Ran SQL postcommit, and Dataflow Portability Java validatesrunner
> tests.
> > >
> > > -P.
> > >
> > > On Wed, Apr 17, 2019 at 1:38 AM Jean-Baptiste Onofré  > > <mailto:j...@nanthrax.net>> wrote:
> > >
> > > +1 (binding)
> > >
> > > Quickly checked with beam-samples.
> > >
> > > Regards
> > > JB
> > >
> > > On 16/04/2019 00:50, Andrew Pilloud wrote:
> > >  > Hi everyone,
> > >  >
> > >  > Please review and vote on the release candidate #4 for the
> version
> > >  > 2.12.0, as follows:
> > >  >
> > >  > [ ] +1, Approve the release
> > >  > [ ] -1, Do not approve the release (please provide specific
> comments)
> > >  >
> > >  > The complete staging area is available for your review, which
> > > includes:
> > >  > * JIRA release notes [1],
> > >  > * the official Apache source release to be deployed to
> > > dist.apache.org <http://dist.apache.org>
> > >  > <http://dist.apache.org> [2], which is signed with the key with
> > >  > fingerprint 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
> > >  > * all artifacts to be deployed to the Maven Central Repository
> [4]

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-22 Thread Andrew Pilloud

I signed the wheels files and updated the build process to not require
giving travis apache credentials. (You should probably change your password
if you haven't already.)

Andrew

On Mon, Apr 22, 2019 at 12:18 PM Ahmet Altay  wrote:

> +1 (binding)
>
> Verified the python 2 wheel files with quick start examples.
>
> On Mon, Apr 22, 2019 at 11:26 AM Ahmet Altay  wrote:
>
>> I built the wheel files. They are in the usual place along with other
>> python artifacts. I will test them a bit and update here. Could someone
>> else please try the wheel files as well?
>>
>> Andrew, could you sign and hash the wheel files?
>>
>> On Mon, Apr 22, 2019 at 10:11 AM Ahmet Altay  wrote:
>>
>>> I verified
>>> - signatures and hashes.
>>>  - python streaming quickstart guide
>>>
>>> I would like to verify the wheel files before voting. Please let us know
>>> when they are ready. Also, if you need help with building wheel files I can
>>> help/build.
>>>
>>> Ahmet
>>>
>>> On Mon, Apr 22, 2019 at 3:33 AM Maximilian Michels 
>>> wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> Found a minor bug while testing, but not a blocker:
>>>> https://jira.apache.org/jira/browse/BEAM-7128
>>>>
>>>> Thanks,
>>>> Max
>>>>
>>>> On 20.04.19 23:02, Pablo Estrada wrote:
>>>> > +1
>>>> > Ran SQL postcommit, and Dataflow Portability Java validatesrunner
>>>> tests.
>>>> >
>>>> > -P.
>>>> >
>>>> > On Wed, Apr 17, 2019 at 1:38 AM Jean-Baptiste Onofré >>> > <mailto:j...@nanthrax.net>> wrote:
>>>> >
>>>> > +1 (binding)
>>>> >
>>>> > Quickly checked with beam-samples.
>>>> >
>>>> > Regards
>>>> > JB
>>>> >
>>>> > On 16/04/2019 00:50, Andrew Pilloud wrote:
>>>> >  > Hi everyone,
>>>> >  >
>>>> >  > Please review and vote on the release candidate #4 for the
>>>> version
>>>> >  > 2.12.0, as follows:
>>>> >  >
>>>> >  > [ ] +1, Approve the release
>>>> >  > [ ] -1, Do not approve the release (please provide specific
>>>> comments)
>>>> >  >
>>>> >  > The complete staging area is available for your review, which
>>>> > includes:
>>>> >  > * JIRA release notes [1],
>>>> >  > * the official Apache source release to be deployed to
>>>> > dist.apache.org <http://dist.apache.org>
>>>> >  > <http://dist.apache.org> [2], which is signed with the key
>>>> with
>>>> >  > fingerprint 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
>>>> >  > * all artifacts to be deployed to the Maven Central Repository
>>>> [4],
>>>> >  > * source code tag "v2.12.0-RC4" [5],
>>>> >  > * website pull request listing the release [6], publishing the
>>>> API
>>>> >  > reference manual [7], and the blog post [8].
>>>> >  > * Java artifacts were built with Gradle/5.2.1 and
>>>> OpenJDK/Oracle JDK
>>>> >  > 1.8.0_181.
>>>> >  > * Python artifacts are deployed along with the source release
>>>> to the
>>>> >  > dist.apache.org <http://dist.apache.org> <
>>>> http://dist.apache.org>
>>>> > [2].
>>>> >  > * Validation sheet with a tab for 2.12.0 release to help with
>>>> > validation
>>>> >  > [9].
>>>> >  >
>>>> >  > The vote will be open for at least 72 hours. It is adopted by
>>>> > majority
>>>> >  > approval, with at least 3 PMC affirmative votes.
>>>> >  >
>>>> >  > Thanks,
>>>> >  > Andrew
>>>> >  >
>>>> >  > 1]
>>>> >
>>>> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
>>>> >  > [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
>>>> >  > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>> >  > [4]
>>>> >
>>>> https://repository.apache.org/content/repositories/orgapachebeam-1068/
>>>> >  > [5] https://github.com/apache/beam/tree/v2.12.0-RC4
>>>> >  > [6] https://github.com/apache/beam/pull/8215
>>>> >  > [7] https://github.com/apache/beam-site/pull/588
>>>> >  > [8] https://github.com/apache/beam/pull/8314
>>>> >  > [9]
>>>> >
>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984
>>>> >
>>>> > --
>>>> > Jean-Baptiste Onofré
>>>> > jbono...@apache.org <mailto:jbono...@apache.org>
>>>> > http://blog.nanthrax.net
>>>> > Talend - http://www.talend.com
>>>> >
>>>>
>>>

[VOTE] Release 2.12.0, release candidate #4

2019-04-15 Thread Andrew Pilloud

Hi everyone,

Please review and vote on the release candidate #4 for the version 2.12.0,
as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 9E7CEC0661EFD610B6
32C610AE8FE17F9F8AE3D4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.12.0-RC4" [5],
* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [8].
* Java artifacts were built with Gradle/5.2.1 and OpenJDK/Oracle JDK
1.8.0_181.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.12.0 release to help with validation
[9].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Andrew

1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
[2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1068/
[5] https://github.com/apache/beam/tree/v2.12.0-RC4
[6] https://github.com/apache/beam/pull/8215
[7] https://github.com/apache/beam-site/pull/588
[8] https://github.com/apache/beam/pull/8314
[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

Re: [VOTE] Release 2.12.0, release candidate #3

2019-04-12 Thread Andrew Pilloud

The vote on RC3 is canceled. It seems like I'm moving a little too fast on
this release, so I'll wait until Monday to cut the next RC to give more
time to find and fix issues.

Andrew

On Wed, Apr 10, 2019 at 7:45 AM Ismaël Mejía  wrote:

> -1 due to dependencies leaking in 'sdks/java/core'. For more details
> see BEAM-7042 [1]
>
> Michael promptly find a fix for the issue. (Thanks Michael!)
> Andrew you can find the cherry-pick PR here [2], and sorry for the
> extra work of another RC
>
> [1] https://issues.apache.org/jira/browse/BEAM-7042
> [2] https://github.com/apache/beam/pull/8269
>
> On Wed, Apr 10, 2019 at 1:02 AM Andrew Pilloud 
> wrote:
> >
> > Hi everyone,
> >
> > Hopefully this is the last one. Please review and vote on the release
> candidate #3 for the version 2.12.0, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.12.0-RC3" [5],
> > * website pull request listing the release [6] and publishing the API
> reference manual [7].
> > * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> > * Validation sheet with a tab for 2.12.0 release to help with validation
> [8].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Andrew
> >
> > [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1067/
> > [5] https://github.com/apache/beam/tree/v2.12.0-RC3
> > [6] https://github.com/apache/beam/pull/8215
> > [7] https://github.com/apache/beam-site/pull/588
> > [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984
> >
>

[VOTE] Release 2.12.0, release candidate #3

2019-04-09 Thread Andrew Pilloud

Hi everyone,

Hopefully this is the last one. Please review and vote on the release
candidate #3 for the version 2.12.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.12.0-RC3" [5],
* website pull request listing the release [6] and publishing the API
reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.12.0 release to help with validation
[8].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Andrew

[1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
[2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1067/
[5] https://github.com/apache/beam/tree/v2.12.0-RC3
[6] https://github.com/apache/beam/pull/8215
[7] https://github.com/apache/beam-site/pull/588
[8]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

Re: PostCommit tests currently failing

2019-04-09 Thread Andrew Pilloud

The nexmark issues are https://issues.apache.org/jira/browse/BEAM-7002

On Tue, Apr 9, 2019 at 9:50 AM Lukasz Cwik  wrote:

> The SplittableDoFn failures are because of the changes associated with
> BEAM-6978.
>
> On Tue, Apr 9, 2019 at 5:05 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> looks as if
>> - beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow
>> - beam_PostCommit_Java_Nexmark_Dataflow
>> - beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow
>>
>> are consistently failing since a few days [1]. Those validate runner
>> tests seem to fail on SplittableDoFnTests [2]. Error might be introduced by
>> https://issues.apache.org/jira/browse/BEAM-6978 and/or
>> https://issues.apache.org/jira/browse/BEAM-7001 [3]
>>
>> Nexmark tests seems to throw NullPointerException [4]
>>
>> Someone able to have a look?
>>
>>
>>
>> [1]
>> http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1
>> [2]
>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow/1078/#showFailuresLink
>> [3]
>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow/1077/#showFailuresLink
>> [4]
>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/2343/console
>>
>

Re: [VOTE] Release 2.12.0, release candidate #2

2019-04-09 Thread Andrew Pilloud

Thanks for finding this Ahmet. Please consider the vote for RC2 canceled.

Andrew

On Tue, Apr 9, 2019 at 9:25 AM Ahmet Altay  wrote:

> -1 unfortunately.
>
> Reason is https://issues.apache.org/jira/browse/BEAM-7038. 2.12 includes
> changes to coders that are not compatible with previous versions breaking
> some Dataflow use cases.
>
> On Mon, Apr 8, 2019 at 3:54 PM Andrew Pilloud  wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #2 for the version
>> 2.12.0, as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org [2],
>> which is signed with the key with fingerprint
>> 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.12.0-RC2" [5],
>> * website pull request listing the release [6] and publishing the API
>> reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.12.0 release to help with validation
>> [8].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Andrew
>>
>> [1]
>> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1066/
>> [5] https://github.com/apache/beam/tree/v2.12.0-RC2
>> [6] https://github.com/apache/beam/pull/8215
>> [7] https://github.com/apache/beam-site/pull/588
>> [8]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984
>>
>

[VOTE] Release 2.12.0, release candidate #2

2019-04-08 Thread Andrew Pilloud

Hi everyone,

Please review and vote on the release candidate #2 for the version 2.12.0,
as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.12.0-RC2" [5],
* website pull request listing the release [6] and publishing the API
reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.12.0 release to help with validation
[8].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Andrew

[1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
[2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1066/
[5] https://github.com/apache/beam/tree/v2.12.0-RC2
[6] https://github.com/apache/beam/pull/8215
[7] https://github.com/apache/beam-site/pull/588
[8]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

Re: [VOTE] Release 2.12.0, release candidate #1

2019-04-04 Thread Andrew Pilloud

Sorry for the confusion, I checked JIRA but not pull requests. Please
consider the vote for RC1 canceled.

I'm going to keep running through all the tests before I cut RC2, so expect
that to happen Monday unless there are new issues or pull requests.

Andrew

On Thu, Apr 4, 2019 at 2:47 AM Ismaël Mejía  wrote:

> > I suggest keeping the bug open until the cherry-pick is complete. That
> makes tracking the burndown easier and is more accurate treatment of Fix
> Version.
>
> Yes, sorry for that mistake, I just systematically resolve issues when
> merged so I did as well for those issues instead of waiting until the
> cherry pick was merged.
>
> > I normally wouldn't say anything since this is micro process, but
> building an RC can take some time so it is worth putting extra effort into
> pre-RC steps.
>
> My excuses Andrew if the lack of open issues was the reason why you
> tagged the branch, sorry for the extra work of creating a new RC
> because of this. can we please cut the next RC.
>
> On Thu, Apr 4, 2019 at 1:11 AM Kenneth Knowles  wrote:
> >
> > I suggest keeping the bug open until the cherry-pick is complete. That
> makes tracking the burndown easier and is more accurate treatment of Fix
> Version.
> >
> > And from the other direction, a good practice is to check not only the
> Jira burndown [1] and also search for pull requests targeting the release
> branch [2].
> >
> > I normally wouldn't say anything since this is micro process, but
> building an RC can take some time so it is worth putting extra effort into
> pre-RC steps.
> >
> > Kenn
> >
> > [1] https://issues.apache.org/jira/projects/BEAM/versions/12344944
> > [2] https://github.com/apache/beam/pulls?q=is:open+base:release-2.12.0
> >
> > On Wed, Apr 3, 2019 at 12:39 PM Ismaël Mejía  wrote:
> >>
> >> -1
> >>
> >> The release misses a cherry pick [1] that fixes an important issue in
> >> Cassandra, without this users won't be able to write to Cassandra. I
> >> know at least 3 users who are waiting for this release to have this
> >> fixed.
> >>
> >> [1] https://github.com/apache/beam/pull/8198/files
> >>
> >> On Wed, Apr 3, 2019 at 8:34 PM Andrew Pilloud 
> wrote:
> >> >
> >> > Hi everyone,
> >> >
> >> > Please review and vote on the release candidate #1 for the version
> 2.12.0, as follows:
> >> >
> >> > [ ] +1, Approve the release
> >> > [ ] -1, Do not approve the release (please provide specific comments)
> >> >
> >> > The complete staging area is available for your review, which
> includes:
> >> > * JIRA release notes [1],
> >> > * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> 9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
> >> > * all artifacts to be deployed to the Maven Central Repository [4],
> >> > * source code tag "v2.12.0-RC1" [5],
> >> > * website pull request listing the release [6] and publishing the API
> reference manual [7].
> >> > * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> >> > * Validation sheet with a tab for 2.12.0 release to help with
> validation [8].
> >> >
> >> > The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
> >> >
> >> > Thanks,
> >> > Andrew
> >> >
> >> > [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
> >> > [2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
> >> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1065/
> >> > [5] https://github.com/apache/beam/tree/v2.12.0-RC1
> >> > [6] https://github.com/apache/beam/pull/8215
> >> > [7] https://github.com/apache/beam-site/pull/588
> >> > [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984
>

[VOTE] Release 2.12.0, release candidate #1

2019-04-03 Thread Andrew Pilloud

Hi everyone,

Please review and vote on the release candidate #1 for the version 2.12.0,
as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
9E7CEC0661EFD610B632C610AE8FE17F9F8AE3D4 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.12.0-RC1" [5],
* website pull request listing the release [6] and publishing the API
reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.12.0 release to help with validation
[8].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Andrew

[1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344944
[2] https://dist.apache.org/repos/dist/dev/beam/2.12.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1065/
[5] https://github.com/apache/beam/tree/v2.12.0-RC1
[6] https://github.com/apache/beam/pull/8215
[7] https://github.com/apache/beam-site/pull/588
[8]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1007316984

Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Andrew Pilloud

+1 on this, particularly removing the dead link checker from default tests.
It is effectively testing that ~20 random websites are up. I wonder if
there is a way to limit it to locally testing links within the beam site?

On Mon, Apr 1, 2019 at 3:54 AM Michael Luckey  wrote:

> Hi,
>
> after playing around with Gradle build for a while, I would like to
> suggest to remove ':beam-website:testWebsite target from Gradle's check
> task.
>
> Rationale:
> - the task seems to be very flaky. In fact, I always need to add '-x
> :beam-website:testWebsite' to my build [1]
> - task uses docker, which imho adds a (unnecessary) severe constraint on
> the build task. E.g. A part time user is unable to execute these tests in a
> docker environment
> - these tests are accessing production environment. So myself hitting the
> build several times an hour could be considered a DOS attack.
>
> Of course, these tests add lots of value and should definitely be
> executed, but wouldn't it be sufficient, to run this task only dedicated,
> i.e. by an explicit call to ':beam-website:testWebsite' o
> ':websitePreCommit'? Any thoughts?
>
> best,
>
> michel
>
> [1] https://issues.apache.org/jira/browse/BEAM-6760
>

Re: [PROPOSAL] Preparing for Beam 2.12.0 release

2019-03-28 Thread Andrew Pilloud

It seems like there was some confusion around when the branch cut was going
to happen. I cut the branch yesterday, but a dozen release blocking fixes
went in immediately after. I recut the branch today, one day late, at
1d9daf1
<https://github.com/apache/beam/commit/1d9daf1aca101fa5a194cbbba969886734e08902>
.

There are currently 3 release blocking issues
<https://issues.apache.org/jira/projects/BEAM/versions/12344944>. I plan to
cut the first RC on March 3rd, so get your cherry-picks in by end of day on
March 2nd.

Andrew

On Mon, Mar 18, 2019 at 9:14 AM Etienne Chauchot 
wrote:

> Sounds great, thanks for volunteering to do the release.
>
> Etienne
>
> Le mercredi 13 mars 2019 à 12:08 -0700, Andrew Pilloud a écrit :
>
> Hello Beam community!
>
> Beam 2.12 release branch cut date is March 27th according to the release
> calendar [1]. I would like to volunteer myself to do this release. I intend
> to cut the branch as planned on March 27th and cherrypick fixes if needed.
>
> If you have releasing blocking issues for 2.12 please mark their "Fix
> Version" as 2.12.0. Kenn created a 2.13.0 release in JIRA in case you
> would like to move any non-blocking issues to that version.
>
> Does this sound reasonable?
>
> Andrew
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
>
>

Re: [PROPOSAL] Preparing for Beam 2.12.0 release

2019-03-28 Thread Andrew Pilloud

Yes, that is what I meant. Sorry about mixing up the month!

On Thu, Mar 28, 2019 at 9:26 AM Robert Burke  wrote:

> I'm going to go out on a limb and assume you mean first RC cut on April
> 3rd, and the Cherry-pick deadline EoD (PST?) April 2nd.
>
> On Thu, 28 Mar 2019 at 09:23, Andrew Pilloud  wrote:
>
>> It seems like there was some confusion around when the branch cut was
>> going to happen. I cut the branch yesterday, but a dozen release blocking
>> fixes went in immediately after. I recut the branch today, one day late, at
>> 1d9daf1
>> <https://github.com/apache/beam/commit/1d9daf1aca101fa5a194cbbba969886734e08902>
>> .
>>
>> There are currently 3 release blocking issues
>> <https://issues.apache.org/jira/projects/BEAM/versions/12344944>. I plan
>> to cut the first RC on March 3rd, so get your cherry-picks in by end of day
>> on March 2nd.
>>
>> Andrew
>>
>> On Mon, Mar 18, 2019 at 9:14 AM Etienne Chauchot 
>> wrote:
>>
>>> Sounds great, thanks for volunteering to do the release.
>>>
>>> Etienne
>>>
>>> Le mercredi 13 mars 2019 à 12:08 -0700, Andrew Pilloud a écrit :
>>>
>>> Hello Beam community!
>>>
>>> Beam 2.12 release branch cut date is March 27th according to the
>>> release calendar [1]. I would like to volunteer myself to do this release.
>>> I intend to cut the branch as planned on March 27th and cherrypick fixes if
>>> needed.
>>>
>>> If you have releasing blocking issues for 2.12 please mark their "Fix
>>> Version" as 2.12.0. Kenn created a 2.13.0 release in JIRA in case you
>>> would like to move any non-blocking issues to that version.
>>>
>>> Does this sound reasonable?
>>>
>>> Andrew
>>>
>>> [1]
>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
>>>
>>>

[PROPOSAL] Preparing for Beam 2.12.0 release

2019-03-13 Thread Andrew Pilloud

Hello Beam community!

Beam 2.12 release branch cut date is March 27th according to the release
calendar [1]. I would like to volunteer myself to do this release. I intend
to cut the branch as planned on March 27th and cherrypick fixes if needed.

If you have releasing blocking issues for 2.12 please mark their "Fix
Version" as 2.12.0. Kenn created a 2.13.0 release in JIRA in case you would
like to move any non-blocking issues to that version.

Does this sound reasonable?

Andrew

[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles

Re: Dependency management for multiple IOs

2019-02-19 Thread Andrew Pilloud

I've thought about this a bit as well. It would be nice if we can move the
SQL-specific IO wrappers into the IOs themselves. Your first proposal and
long term match my thoughts. This at least keeps SQL from making the
dependency problem worse than it is right now.

On the general Java dependency problem, I don't think there is an easy way
to solve it.

On Mon, Feb 18, 2019 at 7:27 PM Kenneth Knowles  wrote:

> Yea. Beam SQL basically just has a clone of the Beam IO problem. So the
> question is whether to mirror the module structure or to put all the
> SQL-specific adapters in one module (or a few) with lots of optional
> dependencies, and probably a complex build.gradle to support running a
> variety of integration tests with different stuff on the classpath.
>
> Kenn
>
> On Mon, Feb 18, 2019 at 11:33 AM Reuven Lax  wrote:
>
>> I don't think this is a SQL-specific problem. Beam (especially when using
>> a variety of different IOs in a single pipeline) aggregates many
>> dependencies into one binary, which sometimes creates this sort of pain.
>> When users have organizational reasons to pin specific dependencies, then
>> things get even worse. I still don't know if there is a perfect solution to
>> all of this.
>>
>> Reuven
>>
>> On Fri, Feb 15, 2019 at 7:42 PM Kenneth Knowles  wrote:
>>
>>> I'm not totally convinced Beam's dep versions are the issue here. A user
>>> may have an organizational requirement of a particular version of, say,
>>> Kafka and Hive. So when they depend on Beam they probably pin those
>>> versions of Kafka and Hive which they have determined work together, and
>>> they hope that the Beam IOs work together.
>>>
>>> I see this as a choice between two scenarios for users:
>>>
>>> 1. SQL <--- KafkaTable (@AutoService) --> KafkaIO
>>> ---provided-> Kafka
>>> 2. SQL (includes KafkaTable) optional> KafkaIO
>>> -provided-> Kakfa
>>>
>>> For users of 1, they depend on Beam Java, Beam SQL, SQL Kafka Table, and
>>> pin a version of Kafka
>>> For users of 2, they depend on Beam Java, Beam SQL, KakfaIO, and pin a
>>> version of Kafka
>>>
>>> To be honest it is really hard to see which is preferable. I think
>>> number 1 has fewer funky dependency edges, more simple "compile + runtime"
>>> dependencies.
>>>
>>> Kenn
>>>
>>>
>>>
>>>
>>> Kenn
>>>
>>> On Fri, Feb 15, 2019 at 6:06 PM Chamikara Jayalath 
>>> wrote:
>>>
 I think the underlying problem is two modules of Beam transitively
 depending on conflicting dependencies (a.k.a. the diamond dependency
 problem) ?

 I think the general solution for this is two fold. (at least the way we
 have formulated in https://beam.apache.org/contribute/dependencies/)

 (1) Keep Beam dependencies as much as possible hoping that transitive
 dependencies stay compatible (we rely on semantic versioning here to not
 cause problems for differences in minor/patch versions. Might not be the
 case in practice for some dependencies).
 (2) For modules with outdated dependencies that we cannot upgrade due
 to some reason, we'll vendor those modules.

 Not sure if your specific problem need something more.

 Thanks,
 Cham

 On Fri, Feb 15, 2019 at 4:48 PM Anton Kedin  wrote:

> Hi dev@,
>
> I have a problem, I don't know a good way to approach the dependency
> management between Beam SQL and Beam IOs, and want to collect thoughts
> about it.
>
> Beam SQL depends on specific IOs so that users can query them. The IOs
> need their dependencies to work. Sometimes the IOs also leak their
> transitive dependencies (e.g. HCatRecord leaked from HCatalogIO). So if in
> SQL we want to build abstractions on top of these IOs we risk having to
> bundle the whole IOs or the leaked dependencies. Overall we can probably
> avoid it by making the IOs `provided` dependencies, and by refactoring the
> code that leaks. In this case things can be made to build, simple tests
> will run, and we won't need to bundle the IOs within SQL.
>
> But as soon as there's a need to actually work with multiple IOs at
> the same time the conflicts appear. For example, for testing of
> Hive/HCatalog IOs in SQL we need to create an embedded Hive Metastore
> instance. It is a very Hive-specific thing that requires its own
> dependencies that have to be loaded during testing as part of SQL project.
> And some other IOs (e.g. KafkaIO) can bring similar but conflicting
> dependencies which means that we cannot easily work with or test both IOs
> at the same time within SQL. I think it will become insane as number of 
> IOs
> supported in SQL grows.
>
> So the question is how to avoid conflicts between IOs within SQL?
>
> One approach is to create separate packages for each of the
> SQL-specific IO wrappers, e.g. `beam-sdks-java-extensions-sql-hcatalog`, 
>

Re: [PROPOSAL] Prepare Beam 2.11.0 release

2019-02-07 Thread Andrew Pilloud

+1 let's keep things going. Thanks for volunteering!

Andrew

On Thu, Feb 7, 2019, 4:49 PM Kenneth Knowles  wrote:

> I think this is a good idea. Even though 2.10.0 is still making it's way
> out, there's still 6 weeks of changes between the cut dates, lots of value.
> It is good to keep the rhythm and follow the calendar.
>
> Kenn
>
> On Thu, Feb 7, 2019, 15:52 Ahmet Altay 
>> Hi all,
>>
>> Beam 2.11 release branch cut date is 2/13 according to the release
>> calendar [1]. I would like to volunteer myself to do this release. I intend
>> to cut the branch on the planned 2/13 date.
>>
>> If you have releasing blocking issues for 2.11 please mark their "Fix
>> Version" as 2.11.0. (I also created a 2.12.0 release in JIRA in case you
>> would like to move any non-blocking issues to that version.)
>>
>> What do you think?
>>
>> Ahmet
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
>>
>

Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Andrew Pilloud

Oops, I'm mixing up terms here. Topics != Subscriptions. We shouldn't be
leaking topics.

Andrew

On Wed, Feb 6, 2019 at 9:04 AM Kenneth Knowles  wrote:

> To clarify, PubsubIO does not clean up auto-created subscriptions, and SQL
> doesn't compensate for that.
>
> On Wed, Feb 6, 2019 at 8:45 AM Mikhail Gryzykhin 
> wrote:
>
>> Thank you for quick response Andrew.
>>
>> I'll cleanup these. I'll keep the bug open and assign it to @Kenneth
>> Knowles  who's working on SQK for follow up: we need a
>> way to automatically cleanup topics.
>>
>> Current suggestions:
>> 1. Make SQL cleanup created topics
>> 2. Cleanup topics created by SQL in tests
>> 3. Increase quota so that topics have enough time to be cleaned up
>> automatically.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Wed, Feb 6, 2019 at 8:05 AM Andrew Pilloud 
>> wrote:
>>
>>> SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.
>>>
>>> Andrew
>>>
>>> On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> +Kenneth Knowles  you're working on SQL recently, so
>>>> might provide some info.
>>>>
>>>> I see a lot of topics of format
>>>> rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
>>>> <https://pantheon.corp.google.com/cloudpubsub/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721?project=apache-beam-testing>.
>>>> Seems we do not cleanup properly.
>>>>
>>>> Is it safe to cleanup topics with this name?
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>>
>>>> On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
>>>> wrote:
>>>>
>>>>> Minor UPD:
>>>>> As expected it fails most of our test jobs, since we use Pub/Subs in
>>>>> many tests.
>>>>>
>>>>> --Mikhail
>>>>>
>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>
>>>>>
>>>>> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Our python pipelines failed with limit exceeded error
>>>>>> <https://builds.apache.org/job/beam_PostCommit_Python_Verify/7313/console>
>>>>>> :
>>>>>>
>>>>>> ResourceExhausted: 429 Your project has exceeded a limit: 
>>>>>> (type="topics-per-project", current=1, maximum=1).
>>>>>>
>>>>>>
>>>>>> Does anyone know if there were new tests that use topics added
>>>>>> recently?
>>>>>>
>>>>>> I tried to see list of topics, but UI fails
>>>>>> <https://pantheon.corp.google.com/cloudpubsub/topicList?project=apache-beam-testing>
>>>>>> to load. Will see if I can use APIs to investigate.
>>>>>>
>>>>>> If anyone has good insight, please, pick up BEAM-6610
>>>>>> <https://issues.apache.org/jira/browse/BEAM-6610>.
>>>>>>
>>>>>> --Mikhail
>>>>>>
>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>
>>>>>

Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Andrew Pilloud

SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.

Andrew

On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin  wrote:

> +Kenneth Knowles  you're working on SQL recently, so
> might provide some info.
>
> I see a lot of topics of format
> rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
> .
> Seems we do not cleanup properly.
>
> Is it safe to cleanup topics with this name?
>
> --Mikhail
>
> Have feedback ?
>
>
> On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
> wrote:
>
>> Minor UPD:
>> As expected it fails most of our test jobs, since we use Pub/Subs in many
>> tests.
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Our python pipelines failed with limit exceeded error
>>> 
>>> :
>>>
>>> ResourceExhausted: 429 Your project has exceeded a limit: 
>>> (type="topics-per-project", current=1, maximum=1).
>>>
>>>
>>> Does anyone know if there were new tests that use topics added recently?
>>>
>>> I tried to see list of topics, but UI fails
>>> 
>>> to load. Will see if I can use APIs to investigate.
>>>
>>> If anyone has good insight, please, pick up BEAM-6610
>>> .
>>>
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>

Re: [ANNOUNCE] New committer announcement: Gleb Kanterov

2019-01-25 Thread Andrew Pilloud

Congratulations Gleb!

On Fri, Jan 25, 2019 at 8:54 AM Ismaël Mejía  wrote:

> Well deserved, congratulations Gleb!
>
> On Fri, Jan 25, 2019 at 10:47 AM Etienne Chauchot 
> wrote:
> >
> > Congrats Gleb and welcome onboard !
> >
> > Etienne
> >
> > Le vendredi 25 janvier 2019 à 10:39 +0100, Alexey Romanenko a écrit :
> >
> > Congrats to Gleb and welcome on board!
> >
> > On 25 Jan 2019, at 09:22, Tim Robertson 
> wrote:
> >
> > Welcome Gleb and congratulations!
> >
> > On Fri, Jan 25, 2019 at 8:06 AM Kenneth Knowles  wrote:
> >
> > Hi all,
> >
> > Please join me and the rest of the Beam PMC in welcoming a new
> committer: Gleb Kanterov
> >
> > Gleb started contributing to Beam and quickly dove deep, doing some
> sensitive fixes to schemas, also general build issues, Beam SQL, Avro, and
> more. In consideration of Gleb's technical and community contributions, the
> Beam PMC trusts Gleb with the responsibilities of a Beam committer [1].
> >
> > Thank you, Gleb, for your contributions.
> >
> > Kenn
> >
> > [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >
> >
>

Re: Load testing on DirectRunner

2019-01-10 Thread Andrew Pilloud

My default advice here is to use the Direct Runner for small smoke tests,
and use the Flink LocalRunner for larger datasets that can still be run
locally. As Reuven points out, the Direct Runner is more of a validation
test itself, it does many things designed to test pipelines for the worst
combinations of conditions they may encounter on other runners.

Andrew

On Thu, Jan 10, 2019 at 10:10 AM Reuven Lax  wrote:

> The Direct Runner as currently implemented is purposely inefficient. It
> was designed for testing, and therefore does many things that are meant to
> expose bugs in user pipelines (e.g. randomly sorting PCollections,
> serializing/deserializing every element, etc.). So it's not surprising that
> it doesn't behave well under load tests.
>
> Reuven
>
> On Thu, Jan 10, 2019 at 5:55 AM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> Hi Everyone,
>>
>> My name is Kasia and I contribute to Beam's tests. Currently, I am
>> working with Łukasz Gajowy on load tests and we created Jenkins
>> configuration to run Synthetic Sources test on DirectRunner. It was decided
>> to generate 1 000 000 000 records (bytes) for a small suite (details you
>> can find in this proposal [1] ). Running this test on the Beam’s Jenkins is
>> causing runtime exception with the message:
>> "java.lang.OutOfMemoryError: GC overhead limit exceeded".
>>
>> Of course, this is not a surprise since it's a lot of data. That's why I
>> am asking for your advice/opinion:
>> Do you think if this test should be smaller? On the other hand, if it's
>> going to be smaller would it be still worth testing as a load test?
>> Maybe it would be better to wait for the UniversalLocalRunner instead and
>> use it while it's there? What is the status of ULR?  Do you know if the ULR
>> will replace DirectRunner?
>>
>> I created an issue [2] with details of this problem where you can find
>> the link to the example of a failing job.
>>
>> Thanks,
>> Kasia
>>
>> [1] https://s.apache.org/load-test-basic-operations
>> 
>> [2] https://issues.apache.org/jira/browse/BEAM-6351
>>
>

Re: [Go SDK] User Defined Coders

2019-01-07 Thread Andrew Pilloud

+1 on this. I think we are somewhat lacking on a written design for Schemas
in Java. This would be really useful in driving adoption and expanding to
other languages.

Andrew

On Mon, Jan 7, 2019 at 3:43 PM Robert Burke  wrote:

> Might I see the design doc (not code) for how they're supposed to look and
> work in Java first? I'd rather not write a document based on a speculative
> understanding of Schemas based on the littany of assumptions I'm making
> about them.
>
>
> On Mon, Jan 7, 2019, 2:35 PM Reuven Lax  wrote:
>
>> I suggest that we write out a design of what schemas in go would look
>> like and how it would interact with coders. We'll then be in a much better
>> position to decide what the right short-term path forward is. Even if we
>> decide it makes more sense to build up the coder support first, I think
>> this will guide us; e.g. we can build up the coder support in a way that
>> can be extended to full schemas later.
>>
>> Writing up an overview design shouldn't take too much time and I think is
>> definitely worth it.
>>
>> Reuven
>>
>> On Mon, Jan 7, 2019 at 2:12 PM Robert Burke  wrote:
>>
>>> Kenn has pointed out to me that Coders are not likely going to vanish in
>>> the next  while, in particular over the FnAPI, so having a coder registry
>>> does remain useful, as described by an early adopter in another thread.
>>>
>>> On Fri, Jan 4, 2019, 10:51 AM Robert Burke  wrote:
>>>
 I think you're right Kenn.

 Reuven alluded to the difficulty in inference of what to use between
 AtomicType and the rest, in particular Struct.

 Go has the additional concerns around Pointer vs Non Pointer types
 which isn't a concern either Python or Java have, but has implications on
 pipeline efficiency that need addressing, in particular, being able to use
 them in a useful fashion in the Go SDK.

 I agree that long term, having schemas as a default codec would be
 hugely beneficial for readability, composability, and allows more
 processing to be on the Runner Harness side of a worker. (I'll save the
 rest of my thoughts on Schemas in Go for the other thread, and say no more
 of it here.)

 *Regarding my proposal for User Defined Coders:*

 To avoid users accidentally preventing themselves from using Schemas in
 the future, I need to remove the ability to override the default coder 
 *(4).
 *Then instead of JSON coding by default *(5)*, the SDK should be doing
 Schema coding. The SDK is already doing the recursive type analysis on
 types at pipeline construction time, so it's not a huge stretch to support
 Schemas using that information in the future, once Runner & FnAPI support
 begins to exist.

 *(1)* doesn't seem to need changing, as this is the existing
 AtomicType definition Kenn pointed out.

 *(2)* is the specific AtomicType override.

 *(3) *is the broader Go specific override for Go's unique interface
 semantics. This most of the cases *(4)* would have covered anyway, but
 in a targeted way.

 This should still allow Go users to better control their pipeline, and
 associated performance implications (which is my goal in this change),
 while not making an overall incompatible choice for powerful beam features
 for the common case in the future.

 Does that sound right?

 On Fri, 4 Jan 2019 at 10:05 Kenneth Knowles  wrote:

> On Thu, Jan 3, 2019 at 4:33 PM Reuven Lax  wrote:
>
>> If a user wants custom encoding for a primitive type, they can create
>> a byte-array field and wrap that field with a Coder
>>
>
> This is the crux of the issue, right?
>
> Roughly, today, we've got:
>
> Schema ::= [ (fieldname, Type) ]
>
> Type ::= AtomicType | Array | Map |
> Struct
>
> AtomicType ::= bytes | int{16, 32, 64} | datetime | string |
> ...
>
> To fully replace custom encodings as they exist, you need:
>
> AtomicType ::= bytes | ...
>
> At this point, an SDK need not surface the concept of "Coder" to a
> user at all outside the bytes field concept and the wire encoding and
> efficient should be identical or nearly to what we do with coders today.
> PCollections in such an SDK have schemas, not coders, so we have
> successfully turned it completely inside-out relative to how the Java SDK
> does it. Is that what you have in mind?
>
> I really like this, but I agree with Robert that this is a major
> change that takes a bunch of work and a lot more collaborative thinking in
> design docs if we hope to get it right/stable.
>
> Kenn
>
>
>> (this is why I said that todays Coders are simply special cases);
>> this should be very rare though, as users rarely should care how Beam
>> encodes a long or a double.
>>
>>>
>>> Offhand, Schemas seem

Re: Beam snapshots broken

2018-12-27 Thread Andrew Pilloud

https://issues.apache.org/jira/browse/BEAM-6282

Kenn (and everyone else who has context on this change) are out this week,
so I don't think anyone is making progress on it. Is this something that
can wait a week or two? If not we should revert
https://github.com/apache/beam/pull/7324

Andrew

On Thu, Dec 27, 2018 at 9:07 AM Gleb Kanterov  wrote:

> I can reproduce this on my machine, and reverting
> https://github.com/apache/beam/pull/7324 fixed the problem. There is a
> separate thread in dev@ about releasing vendored gRPC v0.2, I'm wondering
> if it will this issue.
>
> On Thu, Dec 27, 2018 at 5:20 PM Ismaël Mejía  wrote:
>
>> Looks like snapshots are broken again since 20 december, can somebody
>> PTAL?
>> Seems like some part of the vendoring could be related to this failure
>> (maybe it is looking for the unpublished version)?
>>
>> Running some tests in one existing application I found this
>> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed: 0.447 s <<< FAILURE! - in SerializationTest
>> [ERROR] nonSerilizableTest(SerializationTest)  Time elapsed: 0.028 s  <<<
>> ERROR!
>> java.lang.NoClassDefFoundError:
>>
>> org/apache/beam/vendor/grpc/v1p13p1/com/google/protobuf/ProtocolMessageEnum
>> at SerializationTest.nonSerilizableTest(SerializationTest.java:27)
>> Caused by: java.lang.ClassNotFoundException:
>>
>> org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.ProtocolMessageEnum
>> at SerializationTest.nonSerilizableTest(SerializationTest.java:27)
>>
>> On Thu, Dec 13, 2018 at 9:13 AM Mark Liu  wrote:
>> >
>> > Looks like the recent failure (like this job) is related to
>> ':beam-sdks-python:test' change introduced in this PR. `./gradlew
>> :beam-sdks-python:test` can reproduce the error.
>> >
>> > Testing a fix in PR7273.
>> >
>> > On Wed, Dec 12, 2018 at 8:31 AM Yifan Zou  wrote:
>> >>
>> >> Beam9 is offline right now. But, the job also failed on beam4 and 13
>> with "Could not determine the dependencies of task
>> ':beam-sdks-python:test.".
>> >> Seems like the task dependency did not setup properly.
>> >>
>> >>
>> >>
>> >> On Wed, Dec 12, 2018 at 2:03 AM Ismaël Mejía 
>> wrote:
>> >>>
>> >>> You are right it seems that it was related to beam9 (wondering if it
>> >>> was bad luck that it was always assigned to beam9 or we can improve
>> >>> that poor balancing error).
>> >>> However it failed again today against beam13 maybe this time is just a
>> >>> build issue but seems related to python too.
>> >>>
>> >>> On Tue, Dec 11, 2018 at 7:33 PM Boyuan Zhang 
>> wrote:
>> >>> >
>> >>> > Seems like, all failed jobs are not owing to the single task
>> failure. There failed task were executed on beam9, which was rebooted
>> yesterday because python tests failed continuously. +Yifan Zou may have
>> more useful content here.
>> >>> >
>> >>> > On Tue, Dec 11, 2018 at 9:10 AM Ismaël Mejía 
>> wrote:
>> >>> >>
>> >>> >> It seems that Beam snapshots are broken since Dec. 2
>> >>> >>
>> https://builds.apache.org/view/A-D/view/Beam/job/beam_Release_Gradle_NightlySnapshot/
>> >>> >>
>> >>> >> It seems "The :beam-website:startDockerContainer task failed."
>> >>> >> Can somebody please take a look.
>>
>
>
> --
> Cheers,
> Gleb
>

Re: Beam 8 Jenkins executor broken

2018-12-26 Thread Andrew Pilloud

I rebooted the VM a few hours ago, it appears to be working again.

On Wed, Dec 26, 2018 at 11:01 AM Andrew Pilloud  wrote:

> It seems the Beam 8 Jenkins executor is broken. It fails all builds with
> out of memory issues.
>
> Anyone know what I should do to fix it? Can I just reboot the VM?
>
> Andrew
>

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-12 Thread Andrew Pilloud

+1

Turns out we broke DOUBLE on purpose. Updated the demo to use DECIMAL and
it doesn't hard fail. This is a docs bug.

On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner  wrote:

> +1
>
> I verified the Java examples succeed on DirectRunner.
>
> On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath 
> wrote:
>
>> Thanks Andrew. Please make this a blocker and -1 the thread if you think
>> we need a new RC.
>>
>> - Cham
>>
>> On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud 
>> wrote:
>>
>>> I was just running the Beam SQL demo. I found one query fails with "the
>>> keyCoder of a GroupByKey must be deterministic" and another just hangs. I
>>> opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not
>>> sure if this calls for canceling the release or just a release note (SQL is
>>> still experimental). I'm continuing to track down the root cause, but am
>>> tied up with the Beam Meetup in SFO today.
>>>
>>> Andrew
>>>
>>> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:
>>>
>>>> +1,  Looking forward to the release!
>>>>
>>>> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
>>>> chamik...@google.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I ran Beam RC verification script [1] and updated the validation
>>>>> spreadsheet [2]. I think the current release candidate looks good.
>>>>>
>>>>> So +1 for the release.
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>>>>> [2]
>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>>>>
>>>>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:
>>>>>
>>>>>> Looking at the dates on the Spark runner git log there was a PR
>>>>>> merged to change Spark translation from classes to URNs. I cannot see how
>>>>>> this can impact performance. Looking at the other queries in the
>>>>>> dashboards, there seems to be a great variability in the executions of 
>>>>>> the
>>>>>> Spark runner to the point of feeling we don't have guarantees anymore. I
>>>>>> wonder if this was because of other loads shared in the server(s), or
>>>>>> because our sample is too small for the standard deviation.
>>>>>>
>>>>>> I would proceed with the release, the real question is if we can
>>>>>> somehow constraint the execution of this tests to have a more consistent
>>>>>> output.
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> Regarding query7 in spark:
>>>>>>> - there doesn't seem to be a functional regression: query passes and
>>>>>>> output size is still the same
>>>>>>>
>>>>>>> - Also the performance degradation seems to be only on spark, the
>>>>>>> other runners do not seem to suffer from it.
>>>>>>>
>>>>>>> - performance degradation seems to be constant from 11/12 so we can
>>>>>>> eliminate temporary load on the jenkins server that would generate 
>>>>>>> delays
>>>>>>> in Max transform.
>>>>>>>
>>>>>>> => query7 uses Max transform, fanout and side inputs, has one of
>>>>>>> these parts recently (11/12/18) changed in spark?
>>>>>>>
>>>>>>> Etienne
>>>>>>>
>>>>>>> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
>>>>>>>
>>>>>>> Udi or anybody else who is familiar about Nexmark,  please -1 the
>>>>>>> vote thread if you think this particular performance regression for
>>>>>>> Spark/Direct runners is a blocker. Otherwise I think we can continue the
>>>>>>> vote.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Cham
>>>>>>>
>>>>>>> On Thu, Dec 6, 2018 at 6:19 P

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-12 Thread Andrew Pilloud

I was just running the Beam SQL demo. I found one query fails with "the
keyCoder of a GroupByKey must be deterministic" and another just hangs. I
opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not sure
if this calls for canceling the release or just a release note (SQL is
still experimental). I'm continuing to track down the root cause, but am
tied up with the Beam Meetup in SFO today.

Andrew

On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:

> +1,  Looking forward to the release!
>
> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath 
> wrote:
>
>> Hi All,
>>
>> I ran Beam RC verification script [1] and updated the validation
>> spreadsheet [2]. I think the current release candidate looks good.
>>
>> So +1 for the release.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>> [2]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>
>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:
>>
>>> Looking at the dates on the Spark runner git log there was a PR merged
>>> to change Spark translation from classes to URNs. I cannot see how this can
>>> impact performance. Looking at the other queries in the dashboards, there
>>> seems to be a great variability in the executions of the Spark runner to
>>> the point of feeling we don't have guarantees anymore. I wonder if this was
>>> because of other loads shared in the server(s), or because our sample is
>>> too small for the standard deviation.
>>>
>>> I would proceed with the release, the real question is if we can somehow
>>> constraint the execution of this tests to have a more consistent output.
>>>
>>>
>>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
>>> wrote:
>>>
 Hi all,
 Regarding query7 in spark:
 - there doesn't seem to be a functional regression: query passes and
 output size is still the same

 - Also the performance degradation seems to be only on spark, the other
 runners do not seem to suffer from it.

 - performance degradation seems to be constant from 11/12 so we can
 eliminate temporary load on the jenkins server that would generate delays
 in Max transform.

 => query7 uses Max transform, fanout and side inputs, has one of these
 parts recently (11/12/18) changed in spark?

 Etienne

 Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :

 Udi or anybody else who is familiar about Nexmark,  please -1 the vote
 thread if you think this particular performance regression for Spark/Direct
 runners is a blocker. Otherwise I think we can continue the vote.

 Thanks,
 Cham

 On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath 
 wrote:

 Are either of these regressions due to known issues ? If not should
 they be considered release blockers ?

 Thanks,
 Cham

 On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:

 For DirectRunner there are regressions in query 7 sql direct runner
 batch mode

  (2x)
 and streaming mode (5x).

 On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:

 I see a regression for query 7 spark runner batch mode

  on
 about 2018-11-13.
 [image: image.png]

 On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
 wrote:

 Hi everyone,

 Please review and vote on the release candidate #1 for the version
 2.9.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org
 [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B
  [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.9.0-RC1" [5],
 * website pull request listing the release [6] and publishing the API
 reference manual [7].
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2].
 * Validation sheet with a tab for 2.9.0 release to help with
 validation [7].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Thanks,
 Cham

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
 [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]

Re: Migrating Beam SQL to Calcite's code generation

2018-11-15 Thread Andrew Pilloud

ing this is
>>> the main use case for BEAM-5204. Is it your use case?
>>>
>>> Kenn
>>>
>>> On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu  wrote:
>>>
>>>> Raise this thread.
>>>> Seems there're more changes in the backend on how a FUNCTION is
>>>> executed in the backend, as noticed in #6996
>>>> <https://github.com/apache/beam/pull/6996>:
>>>> 1. BeamSqlExpression and BeamSqlExpressionExecutor are removed;
>>>> 2. BeamSqlExpressionEnvironment are removed;
>>>>
>>>> Then,
>>>> 1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which
>>>> is great and duplicate work is worthless);
>>>> *2. no way to access Beam context now;*
>>>>
>>>> For *#2*, I think we need to find a way to expose it, at least our
>>>> UDF/UDAF should be able to access it to leverage the advantages of Beam
>>>> module.
>>>>
>>>> Any comments?
>>>>
>>>>
>>>> On Wed, Sep 19, 2018 at 2:55 PM Rui Wang  wrote:
>>>>
>>>>> This is a so exciting change!
>>>>>
>>>>> Since we cannot mix current implementation with Calcite code
>>>>> generation, is there any case that Calcite code generation does not 
>>>>> support
>>>>> but our current implementation supports, so switching to Calcite code
>>>>> generation will have some impact to existing usage?
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud 
>>>>> wrote:
>>>>>
>>>>>> To follow up on this, the PR is now in a reviewable state and I've
>>>>>> added more tests for FLOOR and CEIL. Both work with a more extensive set 
>>>>>> of
>>>>>> arguments after this change. There are now 4 outstanding calcite PRs that
>>>>>> get all the tests passing.
>>>>>>
>>>>>> Unfortunately there is no easy way to mix our current implementation
>>>>>> and using Calcite's code generator.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu 
>>>>>> wrote:
>>>>>>
>>>>>>> Awesome work, we should call Calcite operator functions if
>>>>>>> available.
>>>>>>>
>>>>>>> I haven't get time to read the PR yet, for those impacted would keep
>>>>>>> existing implementation. One example is, I notice FLOOR/CEIL only 
>>>>>>> supports
>>>>>>> months/years recently which is quite a surprise to me.
>>>>>>>
>>>>>>> Mingmin
>>>>>>>
>>>>>>> On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is pretty amazing! Thank you for doing this!
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Anton
>>>>>>>>
>>>>>>>> On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've adapted Calcite's EnumerableCalc code generation to generate
>>>>>>>>> the BeamCalc DoFn. The primary purpose behind this change is so we 
>>>>>>>>> can take
>>>>>>>>> advantage of Calcite's extensive SQL operator implementation. This 
>>>>>>>>> deletes
>>>>>>>>> ~11000 lines of code from Beam (with ~350 added), significantly 
>>>>>>>>> increases
>>>>>>>>> the set of supported SQL operators, and improves performance and
>>>>>>>>> correctness of currently supported operators. Here is my work in 
>>>>>>>>> progress:
>>>>>>>>> https://github.com/apache/beam/pull/6417
>>>>>>>>>
>>>>>>>>> There are a few bugs in Calcite that this has exposed:
>>>>>>>>>
>>>>>>>>> Fixed in Calcite master:
>>>>>>>>>
>>>>>>>>>- CALCITE-2321
>>>>>>>>><https://issues.apache.org/jira/browse/CALCITE-2321> - The
>>>>>>>>>type of a union of CHAR columns of different lengths should be 
>>>>>>>>> VARCHAR
>>>>>>>>>- CALCITE-2447
>>>>>>>>><https://issues.apache.org/jira/browse/CALCITE-2447> - Some
>>>>>>>>>POWER, ATAN2 functions fail with NoSuchMethodException
>>>>>>>>>
>>>>>>>>> Pending PRs:
>>>>>>>>>
>>>>>>>>>- CALCITE-2529
>>>>>>>>><https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j
>>>>>>>>>should promote integer to floating point when generating function 
>>>>>>>>> calls
>>>>>>>>>- CALCITE-2530
>>>>>>>>><https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM
>>>>>>>>>function does not throw exception when the length of trim 
>>>>>>>>> character is not
>>>>>>>>>1(one)
>>>>>>>>>
>>>>>>>>> More work:
>>>>>>>>>
>>>>>>>>>- CALCITE-2404
>>>>>>>>><https://issues.apache.org/jira/browse/CALCITE-2404> -
>>>>>>>>>Accessing structured-types is not implemented by the runtime
>>>>>>>>>- (none yet) - Support multi character TRIM extension in
>>>>>>>>>Calcite
>>>>>>>>>
>>>>>>>>> I would like to push these changes in with these minor
>>>>>>>>> regressions. Do any of these Calcite bugs block this functionality 
>>>>>>>>> being
>>>>>>>>> adding to Beam?
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> 
>>>>>>> Mingmin
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> 
>>>> Mingmin
>>>>
>>>
>>
>> --
>> 
>> Mingmin
>>
>

Re: Possible memory leak in Direct Runner unbounded

2018-10-23 Thread Andrew Pilloud

Hi Martin,

I've seen similar things. The Direct Runner is intended for testing with
small datasets, and is expected to retain the entire dataset in memory. It
sounds like you have a pipeline that requires storing data for a GroupByKey
operation. There is no mechanism to page intermediates to disk in the
Direct Runner.

You might want to try the Flink local runner, which should handle this case
better.

Andrew

On Sun, Oct 21, 2018 at 3:43 PM Martin Procházka 
wrote:

> Hello,
> I have got an application, which utilizes Beam pipeline - Direct Runner.
> It contains an unbounded source. I have got a frontend, which manually adds
> some data into the pipeline with the same timestamp in order to be
> processed in the same window.
>
> The pipeline runs well, however it eventually runs out of heap space. I
> have profiled the application and have noticed that there is a hotspot in
> outputWatermark - holds - keyedHolds. It gets swamped mainly by values
> keyed by the anonymous StructuralKey 'empty' classes over time. With every
> request it grows and never gets released.
>
> When I changed the empty structural key to true singleton, it solved a
> part of this issue, but I have noticed that there is a specific test that
> ensures that two empty keys (StructuralKey) are not equal so my change
> would not be valid. When are those empty keys used and when should they be
> removed in the Direct runner? Is there some mechanism to prevent the
> inevitable heap out of memory error after few requests?
>
> Regards,
> Martin Prochazka
>
>
>
>
>

Re: Java > 8 support

2018-10-18 Thread Andrew Pilloud

+1 to adding Jenkins jobs that test with Java 11. Might it also be possible
to add a second java precommit that runs a subset of the tests on Java 11?

Andrew

On Thu, Oct 18, 2018 at 8:52 AM Scott Wegner  wrote:

> If we just want to validate our produced artifacts work with Java11, I
> believe it would be easy to start running some of our Jenkins jobs on Java
> 11. beam_PostRelease_NightlyCandidate [1] is a good candidate: it runs the
> quickstart examples across all runners from the SNAPSHOT published maven
> architype [2].
>
> [1] https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/
> [2]
> https://github.com/apache/beam/blob/master/release/src/main/groovy/QuickstartArchetype.groovy
>
> On Thu, Oct 18, 2018 at 1:26 AM Ismaël Mejía  wrote:
>
>> Yes, Kenn I was talking about building artifacts (because this is the
>> strictest way to validate the compatiblity of our code base and find issues
>> with some of the classpath/code generation we use or with the
>> dependencies). I agree 100% with you, the real deal is to validate that we
>> can use Java 8 built jars with more recent Java versions without issues and
>> examples could be a good target for this.
>>
>> Supporting all non LTS java versions seems like a lot of work for a small
>> return: (1) most users will probably stay in LTS versions, (2) not any
>> single execution system has talked about supporting intermediary versions
>> (some not even LTS) (3) this will augment the combination of tests we
>> should execute to validate a release. Of course if we do care only about
>> direct runner this could make sense.
>>
>> On Thu, Oct 18, 2018 at 9:30 AM Robert Bradshaw 
>> wrote:
>>
>>> On Thu, Oct 18, 2018 at 4:55 AM Kenneth Knowles  wrote:
>>>
 Just to add to what Luke said: The reason we had those Java 8-only
 modules was because some underlying tech (example: Gearpump) required Java
 8. If an engine requires something then it is OK for a user who chooses the
 runner for that engine to also be subject to that requirement. Otherwise I
 would push back a bit on specialized modules.

>>>
>>> I, too, would avoid getting back into the situation where we have
>>> different modules with different build/runtime requirements. (Java 8 was
>>> special both because of Gearpump and because it introduced features such as
>>> lambdas, type inference, and other functional libraries that were
>>> particularly well suited to our programming model).
>>>
>>>
 Ismaël: Do we need to be thinking in terms of building artifacts? I
 think the important thing is if a user has a project and just pulls our
 jars from maven central that they can use JRE 11 to run it. I think what we
 need is a build matrix of user pipelines compiled and run w/ Java 8, 9, 10,
 11 (or some subset). This can be obtained via the examples maven archetype,
 for example, having no relation at all to our own build tooling. I know
 Jenkins has a plugin for this, so we don't run into the "mega-Jenkins-job"
 problem.

>>>
>>> +1 for rephrasing it this way. In general one would expect our jars of
>>> classfiles to work with later JREs. The magic we do with annotations and
>>> bytecode generation may, however, require more support and testing.
>>>
>>>
 On Wed, Oct 17, 2018 at 2:03 PM Lukasz Cwik  wrote:

> When we were supporting Java 7, there were Java 8 based modules in
> Maven. We can follow a similar pattern where Java11 (or any other version)
> have their own projects and as the base Java version moves up, we can 
> merge
> those modules back in to simplify the set of modules being developed.
>
> On Wed, Oct 17, 2018 at 8:14 AM Ismaël Mejía 
> wrote:
>
>> I want to give more context on the previous work towards Java 11
>> support. The issue was not necessarily that gradle was not ok to support
>> Java 11, but this is also a good point Lukasz.
>>
>> Months ago, we built a maven profile that basically set up Java to be
>> strictly compatible with the ‘minimal’ java.base profile and used Java
>> version (9/10) at the time. We found and fixed some missing dependencies
>> (because Java 11 removed some of the Java EE stuff from the base 
>> profile),
>> not sure if these changes were migrated to the gradle build.
>>
>> We also dealt with a good chunk of updates at the time of
>> plugins/bytebuddy and other deps to guarantee that the execution was done
>> strictly with the latest version of Java and its generated bytecode (the
>> origin of the docker build images was to guarantee this isolation). At 
>> the
>> time I run the tests of most IOs and most extensions with the existing
>> (java 8 compiled) direct runner in JRE 10 with success but of course this
>> was not the ultimate goal but a good way to evaluate the state of the 
>> code
>> base (I remember some GCP tests were broken and the ApiSurfaceTests too).

Re: Vendoring vulnerable Guava (CVE-2018-10237)

2018-10-15 Thread Andrew Pilloud

gRPC 1.15 was stuck at 20.0 for Java 6 support, but supports 24.1.1+
<https://github.com/grpc/grpc-java/issues/4176#issuecomment-371305847>.
grpc 1.16 will be out in about a week with a dependency on Guava 26.0 (
https://github.com/grpc/grpc-java/blob/v1.16.x/build.gradle#L114).

I stuck the change into a PR to see what would break, looks like a lot of
things are unhappy: https://github.com/apache/beam/pull/6695

Andrew

On Mon, Oct 15, 2018 at 2:11 PM Lukasz Cwik  wrote:

> For example, we vendor gRPC and it still depends on 20.0 in its latest
> version (https://mvnrepository.com/artifact/io.grpc/grpc-core/1.15.1).
>
> On Mon, Oct 15, 2018 at 2:10 PM Lukasz Cwik  wrote:
>
>> 20.0 is a common version used by many of our dependencies, using 20.0 is
>> least likely to cause classpath issues. Note that with Guava 22.0+, they
>> have said they won't introduce backwards incompatible changes anymore so
>> getting past 22.0 would mean we could just rely on using the latest at all
>> times.
>>
>> I'm not sure the cost of upgrading our dependencies to be compatible with
>> 22.0+ though.
>>
>> On Mon, Oct 15, 2018 at 11:11 AM Andrew Pilloud 
>> wrote:
>>
>>> We vendor a known vulnerable version of Guava. The specific
>>> vulnerability is low to no impact on Beam but it does potentially affect
>>> any server that uses Java serialization with Beam on the classpath. Do we
>>> have a reason for still being on Guava 20.0?
>>>
>>> https://github.com/google/guava/wiki/CVE-2018-10237
>>>
>>> Andrew
>>>
>>

Vendoring vulnerable Guava (CVE-2018-10237)

2018-10-15 Thread Andrew Pilloud

We vendor a known vulnerable version of Guava. The specific vulnerability
is low to no impact on Beam but it does potentially affect any server that
uses Java serialization with Beam on the classpath. Do we have a reason for
still being on Guava 20.0?

https://github.com/google/guava/wiki/CVE-2018-10237

Andrew

1 2 3 >

1 - 100 of 230 matches

Mail list logo