[jira] [Work logged] (BEAM-6855) Side inputs are not supported when using the state API

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?focusedWorklogId=323189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323189
 ]

ASF GitHub Bot logged work on BEAM-6855:


Author: ASF GitHub Bot
Created on: 04/Oct/19 05:24
Start Date: 04/Oct/19 05:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9612: [BEAM-6855] Side 
inputs are not supported when using the state API
URL: https://github.com/apache/beam/pull/9612#issuecomment-538240567
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323189)
Time Spent: 6.5h  (was: 6h 20m)

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core, runner-dataflow, runner-direct
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8111) SchemaCoder broken on DataflowRunner

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8111?focusedWorklogId=323185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323185
 ]

ASF GitHub Bot logged work on BEAM-8111:


Author: ASF GitHub Bot
Created on: 04/Oct/19 05:03
Start Date: 04/Oct/19 05:03
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9446: 
[BEAM-8111] Enable CloudObjectsTest$DefaultCoders
URL: https://github.com/apache/beam/pull/9446
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323185)
Time Spent: 4h 50m  (was: 4h 40m)

> SchemaCoder broken on DataflowRunner
> 
>
> Key: BEAM-8111
> URL: https://issues.apache.org/jira/browse/BEAM-8111
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/commit/e65c176a9f34e45d408281e1101a2ae54cef0f6c
>  broke SchemaCoder on Dataflow. When translating a schema that uses logical 
> types from a cloud object dataflow encounters a runtime error.
> This means any pipelines that use SqlTransform or schema transforms will fail 
> on Dataflow in 2.15.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava

2019-10-03 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944177#comment-16944177
 ] 

Luke Cwik commented on BEAM-5559:
-

We should use a consistent version of our dependencies across all packages to 
ensure that users don't have to resolve conflicts when they depend on multiple 
Beam jars. If we allow one Beam jar to use version 20 while another one needs 
22 then the user may not be able to use both at the same time.

Not all dependencies are listed there because we don't expect users to include 
multiple runners on the classpath at the same time (e.g. Spark and Flink at the 
same time).  Some dependencies aren't used widely enough to appear on that list 
or are known to not break users. Guava uses major version bumps and is known to 
break users on some fairly popular APIs so that is why its in that list.

> Beam Dependency Update Request: com.google.guava:guava
> --
>
> Key: BEAM-5559
> URL: https://issues.apache.org/jira/browse/BEAM-5559
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: 2.15.0
>
>
>  - 2018-10-01 19:30:53.471497 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:18:05.174889 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-15 12:32:27.737694 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:10:18.539470 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8351) Support passing in arbitrary KV pairs to sdk worker via external environment config

2019-10-03 Thread Wanqi Lyu (Jira)
Wanqi Lyu created BEAM-8351:
---

 Summary: Support passing in arbitrary KV pairs to sdk worker via 
external environment config
 Key: BEAM-8351
 URL: https://issues.apache.org/jira/browse/BEAM-8351
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core, sdk-py-harness
Reporter: Wanqi Lyu


Originally, the environment config for environment type of EXTERNAL only 
support passing in an url for the external worker pool; We want to support 
passing in arbitrary KV pairs to sdk worker via external environment config, so 
that the when starting the sdk harness we could get the values from 
`StartWorkerRequest.params`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323074
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:45
Start Date: 03/Oct/19 23:45
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256
 
 
   Here's a breakdown of the changes required to get to pylint 2.4:
   
   - fix a bunch of warnings about deprecated methods.  mostly `logger.warn` 
and various unittest methods
   - update the names of a few error codes: `disable=unused-import` and 
`possibly-unused-variable`
   - ignore a bunch of newly introduced error codes that did not seem important
   - run the lint on python-3.7: this ensures that it can run on test files 
that only work on python-37 due to syntax features
   - merge the lint tests into one test:
 - `run_pylint_2to3.sh` was a test just for testing the futurization.  
seems fine to do this all the time now that our code is python2 compliant
 - there was a "mini" test just for python3-compatibility.  not needed 
anymore now that everything is running on python3
 - stop running `pycodestyle`: it's run as part of `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323074)
Time Spent: 40m  (was: 0.5h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323071
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:43
Start Date: 03/Oct/19 23:43
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331296410
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
 ##
 @@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() 
throws Exception {
   }
 
   @Test
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-8317;)
   public void testSupportsAggregationWithFilterWithoutProjection() throws 
Exception {
 
 Review comment:
   Found a useful reference link with examples: 
https://github.com/Pragmatists/JUnitParams/blob/master/src/test/java/junitparams/usage/SamplesOfUsageTest.java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323071)
Time Spent: 2h 20m  (was: 2h 10m)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405
> 
> 

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323057
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:37
Start Date: 03/Oct/19 23:37
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538169256
 
 
   Here's a breakdown of the changes required to get to pylint 2.4:
   
   - fix a bunch of warnings about deprecated methods.  mostly `logger.warn` 
and various unittest methods
   - update the names of a few error codes: `disable=unused-import` and 
`possibly-unused-variable`
   - ignore a bunch of newly introduced error codes that did not seem important
   - merge the lint tests into one test:
 - `run_pylint_2to3.sh` was a test just for testing the futurization.  
seems fine to do this all the time now that our code is python2 compliant
 - there was a "mini" test just for python3-compatibility.  not needed 
anymore now that everything is running on python3
 - stop running `pycodestyle`: it's run as part of `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323057)
Time Spent: 0.5h  (was: 20m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323049
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:28
Start Date: 03/Oct/19 23:28
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-538167314
 
 
   R: @robertwb 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323049)
Time Spent: 20m  (was: 10m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=323047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323047
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:27
Start Date: 03/Oct/19 23:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725
 
 
   pylint 2.4 provides a number of new features and fixes, but the most 
important/pressing one for me is that 2.4 adds support for understanding python 
type annotations, which fixes a bunch of spurious unused import errors in the 
PR I'm working on for BEAM-7746.
   
   As of 2.0, pylint dropped support for running tests in python2, so to make 
the upgrade we have to move our lint jobs to python3. Doing so will put pylint 
into "python3-mode" and there is not an option to run in python2-compatible 
mode. That said, the beam code is intended to be python3 compatible, so in 
practice, performing a python3 lint on the Beam code-base is perfectly safe. 
The primary risk of doing this is that someone introduces a python-3 only 
change that breaks python2, but these would largely be syntax errors that would 
be immediately caught by the unit and integration tests.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8350) Upgrade to pylint 2.4

2019-10-03 Thread Chad Dombrova (Jira)
Chad Dombrova created BEAM-8350:
---

 Summary: Upgrade to pylint 2.4
 Key: BEAM-8350
 URL: https://issues.apache.org/jira/browse/BEAM-8350
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Chad Dombrova
Assignee: Chad Dombrova


pylint 2.4 provides a number of new features and fixes, but the most 
important/pressing one for me is that 2.4 adds support for understanding python 
type annotations, which fixes a bunch of spurious unused import errors in the 
PR I'm working on for BEAM-7746.

As of 2.0, pylint dropped support for running tests in python2, so to make the 
upgrade we have to move our lint jobs to python3.  Doing so will put pylint 
into "python3-mode" and there is not an option to run in python2-compatible 
mode.  That said, the beam code is intended to be python3 compatible, so in 
practice, performing a python3 lint on the Beam code-base is perfectly safe.  
The primary risk of doing this is that someone introduces a python-3 only 
change that breaks python2, but these would largely be syntax errors that would 
be immediately caught by the unit and integration tests.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=323042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323042
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:19
Start Date: 03/Oct/19 23:19
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9724: [BEAM-8348] make 
job_name a standard option in Python SDK
URL: https://github.com/apache/beam/pull/9724#issuecomment-538165497
 
 
   *This is actually a breaking change* because things such as:
   
   ```py
   pipeline_options = PipelineOptions()
   pipeline_options.view_as(GoogleCloudOptions).job_name = "foo"
   ```
   
   will now fail, so I doubt it will pass tests.
   
   Is there a better way to do this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323042)
Time Spent: 0.5h  (was: 20m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=323041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323041
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:16
Start Date: 03/Oct/19 23:16
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9724: [BEAM-8348] 
make job_name a standard option in Python SDK
URL: https://github.com/apache/beam/pull/9724#discussion_r331289834
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -363,6 +363,9 @@ def _add_argparse_args(cls, parser):
 default=False,
 action='store_true',
 help='Whether to enable streaming mode.')
+parser.add_argument('--job_name',
+default=None,
+help='Name of the job.')
 
 Review comment:
   It will be good to mention that it might or might not be honored by the 
runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323041)
Time Spent: 20m  (was: 10m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8346) Failure: beam_PostRelease_Python_Candidate timeout

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8346?focusedWorklogId=323037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323037
 ]

ASF GitHub Bot logged work on BEAM-8346:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:12
Start Date: 03/Oct/19 23:12
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9723: [BEAM-8346] 
Increase timeout of Python Release Validation job
URL: https://github.com/apache/beam/pull/9723#issuecomment-538163903
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323037)
Time Spent: 50m  (was: 40m)

> Failure: beam_PostRelease_Python_Candidate timeout
> --
>
> Key: BEAM-8346
> URL: https://issues.apache.org/jira/browse/BEAM-8346
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The job requires more time to run all set of validations (quickstart + mobile 
> game batch) on DataflowRunner/DirectRunner using/without using wheels in 
> multiple Python version (2.7, 3.5, 3.6, 3.7)
> Running all validations on Py2.7 environment tooks about 1h10mins. So we 
> probably want to extend it to 5/6 hours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8346) Failure: beam_PostRelease_Python_Candidate timeout

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8346?focusedWorklogId=323036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323036
 ]

ASF GitHub Bot logged work on BEAM-8346:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:12
Start Date: 03/Oct/19 23:12
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9723: [BEAM-8346] 
Increase timeout of Python Release Validation job
URL: https://github.com/apache/beam/pull/9723#issuecomment-538163903
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323036)
Time Spent: 40m  (was: 0.5h)

> Failure: beam_PostRelease_Python_Candidate timeout
> --
>
> Key: BEAM-8346
> URL: https://issues.apache.org/jira/browse/BEAM-8346
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The job requires more time to run all set of validations (quickstart + mobile 
> game batch) on DataflowRunner/DirectRunner using/without using wheels in 
> multiple Python version (2.7, 3.5, 3.6, 3.7)
> Running all validations on Py2.7 environment tooks about 1h10mins. So we 
> probably want to extend it to 5/6 hours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8349) Cleanup Bigquery dataset after Python Mobile Game validation

2019-10-03 Thread Mark Liu (Jira)
Mark Liu created BEAM-8349:
--

 Summary: Cleanup Bigquery dataset after Python Mobile Game 
validation
 Key: BEAM-8349
 URL: https://issues.apache.org/jira/browse/BEAM-8349
 Project: Beam
  Issue Type: Sub-task
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu


run_rc_validation.sh validates Python GameStats and Leaderboard in streaming. 
Before each pipeline starts, a unique bq dataset is created but never cleanup 
after pipeline is done or script is interrupted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=323033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323033
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 03/Oct/19 23:08
Start Date: 03/Oct/19 23:08
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9724: [BEAM-8348] make 
job_name a standard option in Python SDK
URL: https://github.com/apache/beam/pull/9724
 
 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323016
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:55
Start Date: 03/Oct/19 22:55
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9703: [BEAM-6995] 
Beam basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#discussion_r331286370
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslAggregationTest.java
 ##
 @@ -701,7 +700,6 @@ public void testSupportsAggregationWithoutProjection() 
throws Exception {
   }
 
   @Test
-  @Ignore("https://issues.apache.org/jira/browse/BEAM-8317;)
   public void testSupportsAggregationWithFilterWithoutProjection() throws 
Exception {
 
 Review comment:
   @11moon11 @apilloud 
   
   What I really want to propose is when we add new test cases with SQL 
queries, run the test for both dialects unless there is a query syntax mismatch.
   
   Using which planner is controlled by 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlPipelineOptions.java#L28.
   
   
   I am looking for a way of `@RunWith(Parameterized.class)` so it's easy to 
run tests for both dialect by an annotation.   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323016)
Time Spent: 2h 10m  (was: 2h)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 

[jira] [Updated] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-03 Thread Daniel Robert (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Robert updated BEAM-8347:

Description: 
I stumbled upon this and then saw a similar StackOverflow post: 
[https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]

When calling `advance()` if there are no messages, no state changes, including 
no changes to the CheckpointMark or Watermark.  If there is a relatively 
constant rate of new messages coming in, this is not a problem. If data is 
bursty, and there are periods of no new messages coming in, the watermark will 
never advance.

Contrast this with some of the logic in PubsubIO which will make provisions for 
periods of inactivity to advance the watermark (although it, too, is imperfect: 
https://issues.apache.org/jira/browse/BEAM-7322 )

The example given in the StackOverflow post is something like this:

 
{code:java}
pipeline
  .apply(RabbitMqIO.read()
  .withUri("amqp://guest:guest@localhost:5672")
  .withQueue("test")
  .apply("Windowing", 
Window.into(
  FixedWindows.of(Duration.standardSeconds(10)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()){code}
If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
window that never performs an on time trigger.

 

  was:
I stumbled upon this and then saw a similar StackOverflow post: 
[https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]

When calling `advance()` if there are no messages, no state changes, including 
no changes to the CheckpointMark or Watermark.  If there is a relatively 
constant rate of new messages coming in, this is not a problem. If data is 
bursty, and there are periods of new new messages coming in, the watermark will 
never advance.

Contrast this with some of the logic in PubsubIO which will make provisions for 
periods of inactivity to advance the watermark (although it, too, is imperfect: 
https://issues.apache.org/jira/browse/BEAM-7322 )

The example given in the StackOverflow post is something like this:

 
{code:java}
pipeline
  .apply(RabbitMqIO.read()
  .withUri("amqp://guest:guest@localhost:5672")
  .withQueue("test")
  .apply("Windowing", 
Window.into(
  FixedWindows.of(Duration.standardSeconds(10)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()){code}
If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
window that never performs an on time trigger.

 


> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Priority: Major
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=323007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323007
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:34
Start Date: 03/Oct/19 22:34
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9703: [BEAM-6995] Beam 
basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#issuecomment-538154919
 
 
   cc: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323007)
Time Spent: 2h  (was: 1h 50m)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405
> 
> rel#103:BeamCalcRel.BEAM_LOGICAL(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={150.0 
> rows, 801.0 cpu, 0.0 io}
> rel#106:Subset#1.ENUMERABLE, best=rel#105, importance=0.405
> 
> rel#105:BeamEnumerableConverter.ENUMERABLE(input=rel#104:Subset#1.BEAM_LOGICAL),
>  rowcount=50.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#2, type: RecordType(INTEGER id, INTEGER EXPR$1)
> rel#99:Subset#2.NONE, best=null, importance=0.9
> 
> 

[jira] [Commented] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-03 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944085#comment-16944085
 ] 

Kyle Weaver commented on BEAM-8348:
---

I found we need to use `job_name` itself, because that is the corresponding 
option in Java [1]. So maybe we should consider making job_name a standard 
option, as many runners use it on the Java side.

[1] 
[https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java#L275]

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323006
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:32
Start Date: 03/Oct/19 22:32
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #9718: [BEAM-8343] Added 
methods to BeamSqlTable to enable support for predicate/project push-down
URL: https://github.com/apache/beam/pull/9718#issuecomment-538154586
 
 
   cc: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323006)
Time Spent: 0.5h  (was: 20m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
> A proposed way to achieve that is by introducing an interface responsible for 
> identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
> adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
> - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
> - Boolean supportsProjects()
> - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=323000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323000
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:30
Start Date: 03/Oct/19 22:30
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9718: [BEAM-8343] 
Added methods to BeamSqlTable to enable support for predicate/project push-down
URL: https://github.com/apache/beam/pull/9718#discussion_r331279152
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/DefaultTableFilter.java
 ##
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/**
+ * This default implementation of {@link BeamSqlTableFilter} interface. 
Assumes that predicate
+ * push-down is not supported.
+ */
+public class DefaultTableFilter implements BeamSqlTableFilter {
+  private final RexProgram program;
+  private final RexNode filter;
+
+  public DefaultTableFilter(RexProgram program, RexNode filter) {
+this.program = program;
+this.filter = filter;
+  }
+
+  /**
+   * Since predicate push-down is assumed not to be supported by default - 
return an unchanged
+   * filter to be preserved.
+   *
+   * @return Predicate {@code RexNode} which is not supported
+   */
+  @Override
+  public RexNode getNotSupported() {
 
 Review comment:
   After learning about `RexNode`, we probably need something else. You 
suggested a List, where the list entries are ANDs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323000)
Time Spent: 20m  (was: 10m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
> A proposed way to achieve that is by introducing an interface responsible for 
> identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
> adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
> - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
> - Boolean supportsProjects()
> - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=322997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322997
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:30
Start Date: 03/Oct/19 22:30
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9718: [BEAM-8343] 
Added methods to BeamSqlTable to enable support for predicate/project push-down
URL: https://github.com/apache/beam/pull/9718#discussion_r331277730
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BaseBeamTable.java
 ##
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta;
+
+import java.util.List;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+
+/** Basic implementation of {@link BeamSqlTable} methods used by predicate and 
filter push-down. */
+public abstract class BaseBeamTable implements BeamSqlTable {
+
+  @Override
+  public PCollection buildIOReader(
+  PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+return buildIOReader(begin);
 
 Review comment:
   This discards `filters` and `fieldNames`. I think this case should throw an 
exception.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322997)
Remaining Estimate: 0h
Time Spent: 10m

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
> A proposed way to achieve that is by introducing an interface responsible for 
> identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
> adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
> - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
> - Boolean supportsProjects()
> - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=322999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322999
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:30
Start Date: 03/Oct/19 22:30
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9718: [BEAM-8343] 
Added methods to BeamSqlTable to enable support for predicate/project push-down
URL: https://github.com/apache/beam/pull/9718#discussion_r331279649
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java
 ##
 @@ -15,24 +15,36 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.beam.sdk.extensions.sql;
+package org.apache.beam.sdk.extensions.sql.meta;
 
+import java.util.List;
 import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.POutput;
 import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
 
 /** This interface defines a Beam Sql Table. */
 public interface BeamSqlTable {
   /** create a {@code PCollection} from source. */
   PCollection buildIOReader(PBegin begin);
 
+  /** create a {@code PCollection} from source with predicate and/or 
project pushed-down. */
+  PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames);
+
   /** create a {@code IO.write()} instance to write to target. */
   POutput buildIOWriter(PCollection input);
 
+  /** Generate an IO implementation of {@code BeamSqlTableFilter} for 
predicate push-down. */
+  BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter);
 
 Review comment:
   Might this also take a `List` for the filter?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322999)
Time Spent: 20m  (was: 10m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
> A proposed way to achieve that is by introducing an interface responsible for 
> identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
> adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
> - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
> - Boolean supportsProjects()
> - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=322998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322998
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:30
Start Date: 03/Oct/19 22:30
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9718: [BEAM-8343] 
Added methods to BeamSqlTable to enable support for predicate/project push-down
URL: https://github.com/apache/beam/pull/9718#discussion_r331280602
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/SchemaBaseBeamTable.java
 ##
 @@ -18,14 +18,14 @@
 package org.apache.beam.sdk.extensions.sql.impl.schema;
 
 Review comment:
   Should we also move this to `org.apache.beam.sdk.extensions.sql.meta`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322998)
Time Spent: 20m  (was: 10m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
> A proposed way to achieve that is by introducing an interface responsible for 
> identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
> adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
> - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
> - Boolean supportsProjects()
> - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-10-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944063#comment-16944063
 ] 

Rui Wang commented on BEAM-7049:


Ah ok. I found a tricky problem: given the merged cost based optimization, with 
enabled UnionMerge rule, the calcite planner falls into an infinite loop 
without choosing a plan. 

It becomes a very tricky problem. I will need to spend many hours to understand 
Calcite planner, BeamSQL's CBO implementation and others to understand the root 
cause. 


To answer your last question, what I was thinking was to have two rules for 
UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION 
ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so 
you can separate implementation of underlying PTransform.


[1]: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L511
 

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6995) SQL aggregation with where clause fails to plan

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6995?focusedWorklogId=322987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322987
 ]

ASF GitHub Bot logged work on BEAM-6995:


Author: ASF GitHub Bot
Created on: 03/Oct/19 22:15
Start Date: 03/Oct/19 22:15
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #9703: [BEAM-6995] Beam 
basic aggregation rule only when not windowed
URL: https://github.com/apache/beam/pull/9703#issuecomment-538150076
 
 
   can you do a `git pull origin && git rebase origin/master` on this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322987)
Time Spent: 1h 50m  (was: 1h 40m)

> SQL aggregation with where clause fails to plan
> ---
>
> Key: BEAM-6995
> URL: https://issues.apache.org/jira/browse/BEAM-6995
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.11.0
>Reporter: David McIntosh
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> I'm finding that this code fails with a CannotPlanException listed below.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .build();
> Row row = Row.withSchema(schema).addValues(1, 2).build();
> PCollection inputData = p.apply("row input", 
> Create.of(row).withRowSchema(schema));
> inputData.apply("sql",
> SqlTransform.query(
> "SELECT id, SUM(val) "
> + "FROM PCOLLECTION "
> + "WHERE val > 0 "
> + "GROUP BY id"));{code}
> If the WHERE clause is removed the code runs successfully.
> This may be similar to BEAM-5384 since I was able to work around this by 
> adding an extra column to the input that isn't reference in the sql.
> {code:java}
> Schema schema = Schema.builder()
> .addInt32Field("id")
>     .addInt32Field("val")
>     .addInt32Field("extra")
>     .build();{code}
>  
> {code:java}
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#100:Subset#2.BEAM_LOGICAL] could not be implemented; planner state:
> Root: rel#100:Subset#2.BEAM_LOGICAL
> Original rel:
> LogicalAggregate(subset=[rel#100:Subset#2.BEAM_LOGICAL], group=[{0}], 
> EXPR$1=[SUM($1)]): rowcount = 5.0, cumulative cost = {5.687500238418579 rows, 
> 0.0 cpu, 0.0 io}, id = 98
>   LogicalFilter(subset=[rel#97:Subset#1.NONE], condition=[>($1, 0)]): 
> rowcount = 50.0, cumulative cost = {50.0 rows, 100.0 cpu, 0.0 io}, id = 96
> BeamIOSourceRel(subset=[rel#95:Subset#0.BEAM_LOGICAL], table=[[beam, 
> PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 
> 0.0 io}, id = 92
> Sets:
> Set#0, type: RecordType(INTEGER id, INTEGER val)
> rel#95:Subset#0.BEAM_LOGICAL, best=rel#92, 
> importance=0.7291
> rel#92:BeamIOSourceRel.BEAM_LOGICAL(table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> rel#110:Subset#0.ENUMERABLE, best=rel#109, 
> importance=0.36455
> 
> rel#109:BeamEnumerableConverter.ENUMERABLE(input=rel#95:Subset#0.BEAM_LOGICAL),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#1, type: RecordType(INTEGER id, INTEGER val)
> rel#97:Subset#1.NONE, best=null, importance=0.81
> 
> rel#96:LogicalFilter.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,condition=>($1, 
> 0)), rowcount=50.0, cumulative cost={inf}
> 
> rel#102:LogicalCalc.NONE(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={inf}
> rel#104:Subset#1.BEAM_LOGICAL, best=rel#103, importance=0.405
> 
> rel#103:BeamCalcRel.BEAM_LOGICAL(input=rel#95:Subset#0.BEAM_LOGICAL,expr#0..1={inputs},expr#2=0,expr#3=>($t1,
>  $t2),id=$t0,val=$t1,$condition=$t3), rowcount=50.0, cumulative cost={150.0 
> rows, 801.0 cpu, 0.0 io}
> rel#106:Subset#1.ENUMERABLE, best=rel#105, importance=0.405
> 
> rel#105:BeamEnumerableConverter.ENUMERABLE(input=rel#104:Subset#1.BEAM_LOGICAL),
>  rowcount=50.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Set#2, type: RecordType(INTEGER id, INTEGER EXPR$1)
> rel#99:Subset#2.NONE, 

[jira] [Created] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-03 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8348:
-

 Summary: Portable Python job name hard-coded to "job"
 Key: BEAM-8348
 URL: https://issues.apache.org/jira/browse/BEAM-8348
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Kyle Weaver
Assignee: Kyle Weaver


See [1]. `job_name` is already taken by Google Cloud options [2], so I guess we 
should create a new option (maybe `portable_job_name` to avoid disruption).

[[1] 
https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]

[2] 
[https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-03 Thread Daniel Robert (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Robert updated BEAM-8347:

Description: 
I stumbled upon this and then saw a similar StackOverflow post: 
[https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]

When calling `advance()` if there are no messages, no state changes, including 
no changes to the CheckpointMark or Watermark.  If there is a relatively 
constant rate of new messages coming in, this is not a problem. If data is 
bursty, and there are periods of new new messages coming in, the watermark will 
never advance.

Contrast this with some of the logic in PubsubIO which will make provisions for 
periods of inactivity to advance the watermark (although it, too, is imperfect: 
https://issues.apache.org/jira/browse/BEAM-7322 )

The example given in the StackOverflow post is something like this:

 
{code:java}
pipeline
  .apply(RabbitMqIO.read()
  .withUri("amqp://guest:guest@localhost:5672")
  .withQueue("test")
  .apply("Windowing", 
Window.into(
  FixedWindows.of(Duration.standardSeconds(10)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()){code}
If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
window that never performs an on time trigger.

 

  was:
I stumbled upon this and then saw a similar StackOverflow post: 
[https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]

When calling `advance()` if there are no messages, no state changes, including 
no changes to the CheckpointMark or Watermark.  If there is a relatively 
constant rate of new messages coming in, this is not a problem. If data is 
bursty, and there are periods of new new messages coming in, the watermark will 
never advance.

Contrast this with some of the logic in PubsubIO which will make provisions for 
periods of inactivity to advance the watermark (although it, too, is imperfect: 
https://issues.apache.org/jira/browse/BEAM-7322 )

The example given in the StackOverflow post is something like this:

 
{code:java}
pipeline
  .apply(RabbitMqIO.read()
  .withUri("amqp://guest:guest@localhost:5672")
  .withQueue("test")
  .apply("Windowing", 
Window.into(
  FixedWindows.of(Duration.standardSeconds(10)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()){code}
If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
window that never performs and on time trigger. 

 


> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Priority: Major
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of new new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-03 Thread Daniel Robert (Jira)
Daniel Robert created BEAM-8347:
---

 Summary: UnboundedRabbitMqReader can fail to advance watermark if 
no new data comes in
 Key: BEAM-8347
 URL: https://issues.apache.org/jira/browse/BEAM-8347
 Project: Beam
  Issue Type: Bug
  Components: io-java-rabbitmq
Affects Versions: 2.15.0
 Environment: testing has been done using the DirectRunner. I also have 
DataflowRunner available
Reporter: Daniel Robert


I stumbled upon this and then saw a similar StackOverflow post: 
[https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]

When calling `advance()` if there are no messages, no state changes, including 
no changes to the CheckpointMark or Watermark.  If there is a relatively 
constant rate of new messages coming in, this is not a problem. If data is 
bursty, and there are periods of new new messages coming in, the watermark will 
never advance.

Contrast this with some of the logic in PubsubIO which will make provisions for 
periods of inactivity to advance the watermark (although it, too, is imperfect: 
https://issues.apache.org/jira/browse/BEAM-7322 )

The example given in the StackOverflow post is something like this:

 
{code:java}
pipeline
  .apply(RabbitMqIO.read()
  .withUri("amqp://guest:guest@localhost:5672")
  .withQueue("test")
  .apply("Windowing", 
Window.into(
  FixedWindows.of(Duration.standardSeconds(10)))
.triggering(AfterWatermark.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()){code}
If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
window that never performs and on time trigger. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8346) Failure: beam_PostRelease_Python_Candidate timeout

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8346?focusedWorklogId=322936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322936
 ]

ASF GitHub Bot logged work on BEAM-8346:


Author: ASF GitHub Bot
Created on: 03/Oct/19 20:26
Start Date: 03/Oct/19 20:26
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9723: 
[BEAM-8346] Increase timeout of Python Release Validation job
URL: https://github.com/apache/beam/pull/9723
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322936)
Time Spent: 0.5h  (was: 20m)

> Failure: beam_PostRelease_Python_Candidate timeout
> --
>
> Key: BEAM-8346
> URL: https://issues.apache.org/jira/browse/BEAM-8346
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The job requires more time to run all set of validations (quickstart + mobile 
> game batch) on DataflowRunner/DirectRunner using/without using wheels in 
> multiple Python version (2.7, 3.5, 3.6, 3.7)
> Running all validations on Py2.7 environment tooks about 1h10mins. So we 
> probably want to extend it to 5/6 hours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8346) Failure: beam_PostRelease_Python_Candidate timeout

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8346?focusedWorklogId=322896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322896
 ]

ASF GitHub Bot logged work on BEAM-8346:


Author: ASF GitHub Bot
Created on: 03/Oct/19 20:00
Start Date: 03/Oct/19 20:00
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9723: [BEAM-8346] 
Increase timeout of Python Release Validation job
URL: https://github.com/apache/beam/pull/9723#issuecomment-538104459
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322896)
Time Spent: 20m  (was: 10m)

> Failure: beam_PostRelease_Python_Candidate timeout
> --
>
> Key: BEAM-8346
> URL: https://issues.apache.org/jira/browse/BEAM-8346
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The job requires more time to run all set of validations (quickstart + mobile 
> game batch) on DataflowRunner/DirectRunner using/without using wheels in 
> multiple Python version (2.7, 3.5, 3.6, 3.7)
> Running all validations on Py2.7 environment tooks about 1h10mins. So we 
> probably want to extend it to 5/6 hours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8202) Support ParquetTable Writer

2019-10-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943852#comment-16943852
 ] 

Rui Wang commented on BEAM-8202:


You can check the project setup webpage: 
https://beam.apache.org/contribute/#development-setup. Let me know if you know 
any help.

> Support ParquetTable Writer
> ---
>
> Key: BEAM-8202
> URL: https://issues.apache.org/jira/browse/BEAM-8202
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Carlos Antonio Oceguera Hernández
>Priority: Major
>
> https://github.com/apache/beam/pull/9054 supported reader for Parquet Table 
> in BeamSQL. We can support writer as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2009) support JdbcIO as source/sink

2019-10-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943851#comment-16943851
 ] 

Rui Wang commented on BEAM-2009:


You can check the project setup webpage: 
https://beam.apache.org/contribute/#development-setup. Let me know if you know 
any help.

> support JdbcIO as source/sink
> -
>
> Key: BEAM-2009
> URL: https://issues.apache.org/jira/browse/BEAM-2009
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Mingmin Xu
>Assignee: Mujuzi Moses
>Priority: Major
>
> support JdbcIO in both source/sink part:
> 1. as source, JdbcIO read data from databases that supports JDBC such as 
> Oracle/MySQL/Cassandra/...;
> It leads to a bounded pipeline;
> 2. as sink, JdbcIO can persistent data from both unbounded and bounded 
> pipeline;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2009) support JdbcIO as source/sink

2019-10-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943850#comment-16943850
 ] 

Rui Wang commented on BEAM-2009:


Hello Mujuzi, do you want to take this JIRA as your starter task? Here is a 
reference of how to add a new table in SQL: 
https://jira.apache.org/jira/browse/BEAM-8203

> support JdbcIO as source/sink
> -
>
> Key: BEAM-2009
> URL: https://issues.apache.org/jira/browse/BEAM-2009
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Mingmin Xu
>Assignee: Mujuzi Moses
>Priority: Major
>
> support JdbcIO in both source/sink part:
> 1. as source, JdbcIO read data from databases that supports JDBC such as 
> Oracle/MySQL/Cassandra/...;
> It leads to a bounded pipeline;
> 2. as sink, JdbcIO can persistent data from both unbounded and bounded 
> pipeline;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8202) Support ParquetTable Writer

2019-10-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943848#comment-16943848
 ] 

Rui Wang edited comment on BEAM-8202 at 10/3/19 7:34 PM:
-

Hello, Carlos. Do you want to pick up this JIRA as the starter task?


Here is an reference implementation that you can use to implement the writter: 
https://github.com/apache/beam/pull/9597. 


was (Author: amaliujia):
Hello, Carlos. Do you want to pick up this JIRA as the starter task?


Here is an reference implemention that you can use: 
https://github.com/apache/beam/pull/9597. 

> Support ParquetTable Writer
> ---
>
> Key: BEAM-8202
> URL: https://issues.apache.org/jira/browse/BEAM-8202
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Carlos Antonio Oceguera Hernández
>Priority: Major
>
> https://github.com/apache/beam/pull/9054 supported reader for Parquet Table 
> in BeamSQL. We can support writer as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-2009) support JdbcIO as source/sink

2019-10-03 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-2009:
--

Assignee: Mujuzi Moses

> support JdbcIO as source/sink
> -
>
> Key: BEAM-2009
> URL: https://issues.apache.org/jira/browse/BEAM-2009
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Mingmin Xu
>Assignee: Mujuzi Moses
>Priority: Major
>
> support JdbcIO in both source/sink part:
> 1. as source, JdbcIO read data from databases that supports JDBC such as 
> Oracle/MySQL/Cassandra/...;
> It leads to a bounded pipeline;
> 2. as sink, JdbcIO can persistent data from both unbounded and bounded 
> pipeline;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8202) Support ParquetTable Writer

2019-10-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943848#comment-16943848
 ] 

Rui Wang commented on BEAM-8202:


Hello, Carlos. Do you want to pick up this JIRA as the starter task?


Here is an reference implemention that you can use: 
https://github.com/apache/beam/pull/9597. 

> Support ParquetTable Writer
> ---
>
> Key: BEAM-8202
> URL: https://issues.apache.org/jira/browse/BEAM-8202
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Carlos Antonio Oceguera Hernández
>Priority: Major
>
> https://github.com/apache/beam/pull/9054 supported reader for Parquet Table 
> in BeamSQL. We can support writer as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8202) Support ParquetTable Writer

2019-10-03 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-8202:
--

Assignee: Carlos Antonio Oceguera Hernández

> Support ParquetTable Writer
> ---
>
> Key: BEAM-8202
> URL: https://issues.apache.org/jira/browse/BEAM-8202
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Carlos Antonio Oceguera Hernández
>Priority: Major
>
> https://github.com/apache/beam/pull/9054 supported reader for Parquet Table 
> in BeamSQL. We can support writer as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8346) Failure: beam_PostRelease_Python_Candidate timeout

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8346?focusedWorklogId=322853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322853
 ]

ASF GitHub Bot logged work on BEAM-8346:


Author: ASF GitHub Bot
Created on: 03/Oct/19 18:44
Start Date: 03/Oct/19 18:44
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9723: 
[BEAM-8346] Increase timeout of Python Release Validation job
URL: https://github.com/apache/beam/pull/9723
 
 
   Fix timeout failure like 
https://builds.apache.org/job/beam_PostRelease_Python_Candidate/167/. See 
details in https://issues.apache.org/jira/browse/BEAM-8346.
   
   +R: @yifanzou
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8346) Failure: beam_PostRelease_Python_Candidate timeout

2019-10-03 Thread Mark Liu (Jira)
Mark Liu created BEAM-8346:
--

 Summary: Failure: beam_PostRelease_Python_Candidate timeout
 Key: BEAM-8346
 URL: https://issues.apache.org/jira/browse/BEAM-8346
 Project: Beam
  Issue Type: Sub-task
  Components: test-failures
Reporter: Mark Liu
Assignee: Mark Liu


The job requires more time to run all set of validations (quickstart + mobile 
game batch) on DataflowRunner/DirectRunner using/without using wheels in 
multiple Python version (2.7, 3.5, 3.6, 3.7)

Running all validations on Py2.7 environment tooks about 1h10mins. So we 
probably want to extend it to 5/6 hours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8345) Add missing validations to run_rc_validation.sh

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8345?focusedWorklogId=322829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322829
 ]

ASF GitHub Bot logged work on BEAM-8345:


Author: ASF GitHub Bot
Created on: 03/Oct/19 18:26
Start Date: 03/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9722: 
[BEAM-8345] Add Java mobile game on DirectRunner to release script
URL: https://github.com/apache/beam/pull/9722
 
 
   `run_rc_validation.sh` only has Java Mobile Game on DataflowRunner. We need 
same validation for DirectRunner as well.
   
   - Added Java Mobile Game on DirectRunner block to the script.
   - Reuse bq and pubsub resources.
   - Updated `script.config` to have separate flags to control 
`java_mobile_game_direct` and `java_mobile_game_dataflow`.
   
   +R: @yifanzou 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8345) Add missing validations to run_rc_validation.sh

2019-10-03 Thread Mark Liu (Jira)
Mark Liu created BEAM-8345:
--

 Summary: Add missing validations to run_rc_validation.sh
 Key: BEAM-8345
 URL: https://issues.apache.org/jira/browse/BEAM-8345
 Project: Beam
  Issue Type: Sub-task
  Components: testing
Reporter: Mark Liu
Assignee: Mark Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8337) Consider publishing portable job server container images

2019-10-03 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8337:
--
Summary: Consider publishing portable job server container images  (was: 
Publish portable job server container images)

> Consider publishing portable job server container images
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Priority: Major
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8183) Optionally bundle multiple pipelines into a single Flink jar

2019-10-03 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943801#comment-16943801
 ] 

Kenneth Knowles commented on BEAM-8183:
---

OK, even after browsing the thread (I don't have time to pick it all up) I 
don't quite understand the context. Ankur clarified what I meant to describe. 
All of the above forms of flexibility are in use and important. I would guess 
multi-pipeline is the least important since that is a pretty trivial 
convenience.

> Optionally bundle multiple pipelines into a single Flink jar
> 
>
> Key: BEAM-8183
> URL: https://issues.apache.org/jira/browse/BEAM-8183
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> [https://github.com/apache/beam/pull/9331#issuecomment-526734851]
> "With Flink you can bundle multiple entry points into the same jar file and 
> specify which one to use with optional flags. It may be desirable to allow 
> inclusion of multiple pipelines for this tool also, although that would 
> require a different workflow. Absent this option, it becomes quite convoluted 
> for users that need the flexibility to choose which pipeline to launch at 
> submission time."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=322750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322750
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 03/Oct/19 17:09
Start Date: 03/Oct/19 17:09
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9692: [BEAM-7389] 
Update docs with matching code files
URL: https://github.com/apache/beam/pull/9692#issuecomment-538037024
 
 
   * Moved all docs source from `element-wise` to `elementwise`.
   * Updated for changes on #9664 to use `elementwise` instead of 
`element_wise`.
   * TODO: regenerate notebooks
   
   If #9669 lands first, I could update the include buttons from here as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322750)
Time Spent: 62h 50m  (was: 62h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 62h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=322746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322746
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:58
Start Date: 03/Oct/19 16:58
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9669: [BEAM-7389] 
Update include buttons to support multiple languages
URL: https://github.com/apache/beam/pull/9669#issuecomment-538032732
 
 
   @aaltay This is ready for review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322746)
Time Spent: 62h 40m  (was: 62.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 62h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8339) Fix bugs in automation scripts

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8339?focusedWorklogId=322742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322742
 ]

ASF GitHub Bot logged work on BEAM-8339:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:45
Start Date: 03/Oct/19 16:45
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9719: 
[BEAM-8339] Quietly skip cleanup_pubsub in automation script
URL: https://github.com/apache/beam/pull/9719
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322742)
Time Spent: 40m  (was: 0.5h)

> Fix bugs in automation scripts
> --
>
> Key: BEAM-8339
> URL: https://issues.apache.org/jira/browse/BEAM-8339
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8339) Fix bugs in automation scripts

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8339?focusedWorklogId=322741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322741
 ]

ASF GitHub Bot logged work on BEAM-8339:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:45
Start Date: 03/Oct/19 16:45
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9719: [BEAM-8339] 
Quietly skip cleanup_pubsub in automation script
URL: https://github.com/apache/beam/pull/9719#issuecomment-538027417
 
 
   PythonLint failure is unrelated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322741)
Time Spent: 0.5h  (was: 20m)

> Fix bugs in automation scripts
> --
>
> Key: BEAM-8339
> URL: https://issues.apache.org/jira/browse/BEAM-8339
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8339) Fix bugs in automation scripts

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8339?focusedWorklogId=322740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322740
 ]

ASF GitHub Bot logged work on BEAM-8339:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:44
Start Date: 03/Oct/19 16:44
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9716: 
[BEAM-8339] Fix wrong temp_location in run_rc_validation.sh
URL: https://github.com/apache/beam/pull/9716
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322740)
Time Spent: 20m  (was: 10m)

> Fix bugs in automation scripts
> --
>
> Key: BEAM-8339
> URL: https://issues.apache.org/jira/browse/BEAM-8339
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8339) Fix bugs in automation scripts

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8339?focusedWorklogId=322739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322739
 ]

ASF GitHub Bot logged work on BEAM-8339:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:44
Start Date: 03/Oct/19 16:44
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9716: [BEAM-8339] Fix 
wrong temp_location in run_rc_validation.sh
URL: https://github.com/apache/beam/pull/9716#issuecomment-538027220
 
 
   PythonLint is unrelated. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322739)
Remaining Estimate: 0h
Time Spent: 10m

> Fix bugs in automation scripts
> --
>
> Key: BEAM-8339
> URL: https://issues.apache.org/jira/browse/BEAM-8339
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Mark Liu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8334) Expose Language Options for testing

2019-10-03 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud resolved BEAM-8334.
--
Fix Version/s: 2.17.0
   Resolution: Fixed

> Expose Language Options for testing
> ---
>
> Key: BEAM-8334
> URL: https://issues.apache.org/jira/browse/BEAM-8334
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Trivial
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Google has a set of compliance tests for ZetaSQL. The test framework needs 
> access to LanguageOptions to determine what tests are supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8334) Expose Language Options for testing

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8334?focusedWorklogId=322738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322738
 ]

ASF GitHub Bot logged work on BEAM-8334:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:42
Start Date: 03/Oct/19 16:42
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9704: [BEAM-8334] 
Expose Language Options for testing
URL: https://github.com/apache/beam/pull/9704
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322738)
Time Spent: 1h 20m  (was: 1h 10m)

> Expose Language Options for testing
> ---
>
> Key: BEAM-8334
> URL: https://issues.apache.org/jira/browse/BEAM-8334
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Trivial
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Google has a set of compliance tests for ZetaSQL. The test framework needs 
> access to LanguageOptions to determine what tests are supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=322733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322733
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 03/Oct/19 16:35
Start Date: 03/Oct/19 16:35
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9664: [BEAM-7389] 
Created elementwise for consistency with docs
URL: https://github.com/apache/beam/pull/9664#issuecomment-538023491
 
 
   It is now ready for review.
   
   **No changes to logic**
   
   I did take the time to normalize all the code files to use relative imports 
as well as moving the asserts to `check_` functions for consistency in the test 
files. Other than that, there are no changes from the `element_wise` 
counterparts besides the small changes to make linter and all tests pass.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322733)
Time Spent: 62.5h  (was: 62h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 62.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8287) Update documentation for Python 3 support after Beam 2.16.0.

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8287?focusedWorklogId=322703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322703
 ]

ASF GitHub Bot logged work on BEAM-8287:


Author: ASF GitHub Bot
Created on: 03/Oct/19 15:58
Start Date: 03/Oct/19 15:58
Worklog Time Spent: 10m 
  Work Description: soyrice commented on pull request #9700: [BEAM-8287] 
Python 3 docs updates for 2.16.0
URL: https://github.com/apache/beam/pull/9700#discussion_r331120267
 
 

 ##
 File path: website/src/get-started/quickstart-py.md
 ##
 @@ -27,6 +27,8 @@ If you're interested in contributing to the Apache Beam 
Python codebase, see the
 * TOC
 {:toc}
 
+New versions of the Python SDK will only support Python 3.5 or higher. 
Currently, the Python SDK still supports Python 2.7.x. We recommend using the 
latest Python 3 version.
 
 Review comment:
   Is there a special format for deprecation warnings?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322703)
Time Spent: 3h  (was: 2h 50m)

> Update documentation for Python 3 support after Beam 2.16.0.
> 
>
> Key: BEAM-8287
> URL: https://issues.apache.org/jira/browse/BEAM-8287
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: Valentyn Tymofieiev
>Assignee: Cyrus Maden
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7933) Adding timeout to JobServer grpc calls

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7933?focusedWorklogId=322702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322702
 ]

ASF GitHub Bot logged work on BEAM-7933:


Author: ASF GitHub Bot
Created on: 03/Oct/19 15:58
Start Date: 03/Oct/19 15:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9673: [BEAM-7933] Add 
job server request timeout (default to 60 seconds)
URL: https://github.com/apache/beam/pull/9673
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322702)
Time Spent: 3h 50m  (was: 3h 40m)

> Adding timeout to JobServer grpc calls
> --
>
> Key: BEAM-7933
> URL: https://issues.apache.org/jira/browse/BEAM-7933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.14.0
>Reporter: Enrico Canzonieri
>Assignee: Enrico Canzonieri
>Priority: Minor
>  Labels: portability
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> grpc calls to the JobServer from the Python SDK do not have timeouts. That 
> means that the call to pipeline.run()could hang forever if the JobServer is 
> not running (or failing to start).
> E.g. 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307]
>  the call to Prepare() doesn't provide any timeout value and the same applies 
> to other JobServer requests.
> As part of this ticket we could add a default timeout of 60 seconds as the 
> default timeout for http client.
> Additionally, we could consider adding a --job-server-request-timeout to the 
> [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805]
>  class to be used in the JobServer interactions inside probable_runner.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8183) Optionally bundle multiple pipelines into a single Flink jar

2019-10-03 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943638#comment-16943638
 ] 

Thomas Weise commented on BEAM-8183:


BEAM-8115 is indeed orthogonal and applicable for the cases where 
parameterization can be solved w/o different execution path in the driver 
program. For the remaining cases bundling multiple protos could be the solution.

> Optionally bundle multiple pipelines into a single Flink jar
> 
>
> Key: BEAM-8183
> URL: https://issues.apache.org/jira/browse/BEAM-8183
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> [https://github.com/apache/beam/pull/9331#issuecomment-526734851]
> "With Flink you can bundle multiple entry points into the same jar file and 
> specify which one to use with optional flags. It may be desirable to allow 
> inclusion of multiple pipelines for this tool also, although that would 
> require a different workflow. Absent this option, it becomes quite convoluted 
> for users that need the flexibility to choose which pipeline to launch at 
> submission time."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7933) Adding timeout to JobServer grpc calls

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7933?focusedWorklogId=322640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322640
 ]

ASF GitHub Bot logged work on BEAM-7933:


Author: ASF GitHub Bot
Created on: 03/Oct/19 14:27
Start Date: 03/Oct/19 14:27
Worklog Time Spent: 10m 
  Work Description: ecanzonieri commented on issue #9673: [BEAM-7933] Add 
job server request timeout (default to 60 seconds)
URL: https://github.com/apache/beam/pull/9673#issuecomment-537969691
 
 
   Squashed everything to a single commit now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322640)
Time Spent: 3h 40m  (was: 3.5h)

> Adding timeout to JobServer grpc calls
> --
>
> Key: BEAM-7933
> URL: https://issues.apache.org/jira/browse/BEAM-7933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.14.0
>Reporter: Enrico Canzonieri
>Assignee: Enrico Canzonieri
>Priority: Minor
>  Labels: portability
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> grpc calls to the JobServer from the Python SDK do not have timeouts. That 
> means that the call to pipeline.run()could hang forever if the JobServer is 
> not running (or failing to start).
> E.g. 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307]
>  the call to Prepare() doesn't provide any timeout value and the same applies 
> to other JobServer requests.
> As part of this ticket we could add a default timeout of 60 seconds as the 
> default timeout for http client.
> Additionally, we could consider adding a --job-server-request-timeout to the 
> [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805]
>  class to be used in the JobServer interactions inside probable_runner.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7933) Adding timeout to JobServer grpc calls

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7933?focusedWorklogId=322637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322637
 ]

ASF GitHub Bot logged work on BEAM-7933:


Author: ASF GitHub Bot
Created on: 03/Oct/19 14:24
Start Date: 03/Oct/19 14:24
Worklog Time Spent: 10m 
  Work Description: ecanzonieri commented on issue #9673: [BEAM-7933] Add 
job server request timeout (default to 60 seconds)
URL: https://github.com/apache/beam/pull/9673#issuecomment-537968350
 
 
   I messed up the rebase. I'll fix it and repush.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322637)
Time Spent: 3.5h  (was: 3h 20m)

> Adding timeout to JobServer grpc calls
> --
>
> Key: BEAM-7933
> URL: https://issues.apache.org/jira/browse/BEAM-7933
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.14.0
>Reporter: Enrico Canzonieri
>Assignee: Enrico Canzonieri
>Priority: Minor
>  Labels: portability
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> grpc calls to the JobServer from the Python SDK do not have timeouts. That 
> means that the call to pipeline.run()could hang forever if the JobServer is 
> not running (or failing to start).
> E.g. 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/portable_runner.py#L307]
>  the call to Prepare() doesn't provide any timeout value and the same applies 
> to other JobServer requests.
> As part of this ticket we could add a default timeout of 60 seconds as the 
> default timeout for http client.
> Additionally, we could consider adding a --job-server-request-timeout to the 
> [PortableOptions|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py#L805]
>  class to be used in the JobServer interactions inside probable_runner.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5192) Support Elasticsearch 7.x

2019-10-03 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943619#comment-16943619
 ] 

Etienne Chauchot commented on BEAM-5192:


[~chet.aldrich] I just assigned the ticket to you

> Support Elasticsearch 7.x
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5192) Support Elasticsearch 7.x

2019-10-03 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-5192:
--

Assignee: Chet Aldrich  (was: Tim Robertson)

> Support Elasticsearch 7.x
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8298) Implement state caching for side inputs

2019-10-03 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8298:
-
Parent: (was: BEAM-5428)
Issue Type: Improvement  (was: Sub-task)

> Implement state caching for side inputs
> ---
>
> Key: BEAM-8298
> URL: https://issues.apache.org/jira/browse/BEAM-8298
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Priority: Major
>
> Caching is currently only implemented for user state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8297) Evict state cache based on an upper memory limit

2019-10-03 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8297:
-
Parent: (was: BEAM-5428)
Issue Type: Improvement  (was: Sub-task)

> Evict state cache based on an upper memory limit
> 
>
> Key: BEAM-8297
> URL: https://issues.apache.org/jira/browse/BEAM-8297
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>
> The memory limit is currently specified in terms of a maximum number of cache 
> items. We should switch to a memory-based limit which would make it easier to 
> reason about the state cache size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5428) Implement cross-bundle state caching.

2019-10-03 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved BEAM-5428.
--
Fix Version/s: 2.17.0
   Resolution: Fixed

The first version of cross-bundle state caching has been merged. For now, it is 
disabled by default. It can be turned on via the 
{{experiments=state_cache_size=}} flag, where 
{{number_of_items}} is the maximum number of items to hold in the cache. 
Notably, there are some limitations, also captured in these JIRA issues:

- State cache size is determined on the number of items, instead of the total 
memory size of the cache: BEAM-8297
- Only user state is currently cached: BEAM-8298.

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 29h
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8298) Implement state caching for side inputs

2019-10-03 Thread Maximilian Michels (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8298:
-
Status: Open  (was: Triage Needed)

> Implement state caching for side inputs
> ---
>
> Key: BEAM-8298
> URL: https://issues.apache.org/jira/browse/BEAM-8298
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Priority: Major
>
> Caching is currently only implemented for user state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?focusedWorklogId=322496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322496
 ]

ASF GitHub Bot logged work on BEAM-6829:


Author: ASF GitHub Bot
Created on: 03/Oct/19 09:37
Start Date: 03/Oct/19 09:37
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #8585: [BEAM-6829] Use 
transform/pcollection name for metric namespace if none provided
URL: https://github.com/apache/beam/pull/8585#issuecomment-537869268
 
 
   >Sounds good, I will try it again with the latest changes.
   
   Thanks, curious to see the results! :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322496)
Time Spent: 3h  (was: 2h 50m)

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6829) Duplicate metric warnings clutter log

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6829?focusedWorklogId=322492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322492
 ]

ASF GitHub Bot logged work on BEAM-6829:


Author: ASF GitHub Bot
Created on: 03/Oct/19 09:31
Start Date: 03/Oct/19 09:31
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #8585: [BEAM-6829] Use 
transform/pcollection name for metric namespace if none provided
URL: https://github.com/apache/beam/pull/8585#issuecomment-537584412
 
 
   @tweise I've revisited this problem and found two other important fixes in 
addition to the first commit:
   
   1. We also need to include PCollection-scoped metrics, e.g. num elements
   2. The metric reporting was doubled in ExecutableDoFnOperator. Once by the 
operator,
   once by the wrapping metrics DoFnRunner.
   
   In my tests I could not find any duplicate metrics anymore. Plus, the 
scoping of the metrics is done correctly, whether it is user metrics, transform 
metrics, or pcollection metrics.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322492)
Time Spent: 2h 40m  (was: 2.5h)

> Duplicate metric warnings clutter log
> -
>
> Key: BEAM-6829
> URL: https://issues.apache.org/jira/browse/BEAM-6829
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Logs fill up quickly with these warnings: 
> {code:java}
> WARN org.apache.flink.metrics.MetricGroup - Name collision: Group already 
> contains a Metric with the name ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=322459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322459
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 03/Oct/19 08:01
Start Date: 03/Oct/19 08:01
Worklog Time Spent: 10m 
  Work Description: angulartist commented on issue #9684: [BEAM-7765] 
[Closed]
URL: https://github.com/apache/beam/pull/9684#issuecomment-537834214
 
 
   > Canceling the change?
   
   Yup, I opened another PR [#9685](https://github.com/apache/beam/pull/9685)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 322459)
Time Spent: 1h 20m  (was: 1h 10m)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8344) Add infer schema support in ParquetIO and refactor ParquetTableProvider

2019-10-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8344?focusedWorklogId=322438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-322438
 ]

ASF GitHub Bot logged work on BEAM-8344:


Author: ASF GitHub Bot
Created on: 03/Oct/19 06:33
Start Date: 03/Oct/19 06:33
Worklog Time Spent: 10m 
  Work Description: bmv126 commented on pull request #9721: [BEAM-8344] Add 
inferSchema support in ParquetIO and refactor ParquetTableProvider
URL: https://github.com/apache/beam/pull/9721
 
 
   Task Details:
   Add support for inferring Beam Schema in ParquetIO.
   Refactor ParquetTable code to use Convert.rows().
   Remove unnecessary java class GenericRecordReadConverter.
   
   R: @reuvenlax  R: @amaliujia 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build