[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185149
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 06:27
Start Date: 15/Jan/19 06:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7504: 
[BEAM-6427] INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185149)
Time Spent: 2h 10m  (was: 2h)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185147
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 15/Jan/19 06:26
Start Date: 15/Jan/19 06:26
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7506: 
[BEAM-6430] Fix EXCEPT 
URL: https://github.com/apache/beam/pull/7506
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185147)
Time Spent: 1h 10m  (was: 1h)

> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185146
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 06:25
Start Date: 15/Jan/19 06:25
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7504: 
[BEAM-6427] INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247775196
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) {
 case INTERSECT:
   if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) 
{
 if (all) {
-  for (Row leftRow : leftRows) {
-ctx.output(leftRow);
+  Iterator iter = leftRows.iterator();
+  int leftCount = 0;
+  int rightCount = 0;
+  while (iter.hasNext()) {
+iter.next();
+leftCount++;
+  }
+  iter = rightRows.iterator();
+  while (iter.hasNext()) {
+iter.next();
+rightCount++;
+  }
+
+  // output MIN(m, n)
+  iter = (leftCount <= rightCount) ? leftRows.iterator() : 
rightRows.iterator();
+  while (iter.hasNext()) {
 
 Review comment:
   I had the same thought. To do the `row -> KV` without a shuffle 
(and we definitely want to do some version of it before a shuffle) there are 
lots of performance considerations and some nondeterminism. The Dataflow 
implementation is now open source 
[here](https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/PartialGroupByKeyParDoFns.java)
 and we could do something like that in a `ParDo`. Data consistency requires 
that this be followed up by a full combine. The intermediate results do not 
work as any kind of `Rel` but we _can_ use this as intermediate PCollections 
just for doing set operations. It would have to actually be:
   
   `row -> KV -> CoGBK -> KV` but small iterables of the partial counts that are easy to 
sum inline with the emit logic.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185146)
Time Spent: 2h  (was: 1h 50m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185135
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 05:52
Start Date: 15/Jan/19 05:52
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247771055
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) {
 case INTERSECT:
   if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) 
{
 if (all) {
-  for (Row leftRow : leftRows) {
-ctx.output(leftRow);
+  Iterator iter = leftRows.iterator();
+  int leftCount = 0;
+  int rightCount = 0;
+  while (iter.hasNext()) {
+iter.next();
+leftCount++;
+  }
+  iter = rightRows.iterator();
+  while (iter.hasNext()) {
+iter.next();
+rightCount++;
+  }
+
+  // output MIN(m, n)
+  iter = (leftCount <= rightCount) ? leftRows.iterator() : 
rightRows.iterator();
+  while (iter.hasNext()) {
 
 Review comment:
   Can we instead potentially implement it with a combiner on both inputs, that 
counts the rows? I.e. `row -> KV -> CoGBK -> emit min(count) number 
of times`? Would it be more optimal?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185135)
Time Spent: 1h 50m  (was: 1h 40m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185133
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 05:49
Start Date: 15/Jan/19 05:49
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247770795
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) {
 case INTERSECT:
   if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) 
{
 if (all) {
-  for (Row leftRow : leftRows) {
-ctx.output(leftRow);
+  Iterator iter = leftRows.iterator();
+  int leftCount = 0;
+  int rightCount = 0;
+  while (iter.hasNext()) {
+iter.next();
+leftCount++;
+  }
+  iter = rightRows.iterator();
+  while (iter.hasNext()) {
+iter.next();
+rightCount++;
+  }
+
+  // output MIN(m, n)
+  iter = (leftCount <= rightCount) ? leftRows.iterator() : 
rightRows.iterator();
+  while (iter.hasNext()) {
 
 Review comment:
   I see. What I was missing is how this whole thing works, i.e. that it runs 
`CoGBK` with keys and values being the whole row, producing a `CoGBKResult` 
with a collection of duplicates for each distinct row, which we then emit here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185133)
Time Spent: 1h 40m  (was: 1.5h)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185125
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 15/Jan/19 05:20
Start Date: 15/Jan/19 05:20
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7506: [BEAM-6430] Fix 
EXCEPT 
URL: https://github.com/apache/beam/pull/7506#issuecomment-454270404
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185125)
Time Spent: 1h  (was: 50m)

> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185113
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 04:34
Start Date: 15/Jan/19 04:34
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247761589
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -17,6 +17,7 @@
  */
 package org.apache.beam.sdk.extensions.sql.impl.transform;
 
+import com.google.common.collect.Iterators;
 
 Review comment:
   using vendored Guava now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185113)
Time Spent: 1.5h  (was: 1h 20m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185115=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185115
 ]

ASF GitHub Bot logged work on BEAM-6418:


Author: ASF GitHub Bot
Created on: 15/Jan/19 04:40
Start Date: 15/Jan/19 04:40
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7510: [BEAM-6418] Execute Flink 
tests serially to avoid memory issues
URL: https://github.com/apache/beam/pull/7510#issuecomment-454265116
 
 
   The serial solution makes sense, should this be a harder to fix problem. It 
is also inconvenient that at the moment the tests can't be executed at all for 
1.6/1.7 without manual intervention.
   
   Attempting to fix this here: https://github.com/apache/beam/pull/7512
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185115)
Time Spent: 1h 40m  (was: 1.5h)

> beam_PreCommit_Java_Cron failing
> 
>
> Key: BEAM-6418
> URL: https://issues.apache.org/jira/browse/BEAM-6418
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Ahmet Altay
>Assignee: Maximilian Michels
>Priority: Critical
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console]
>  
> *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor 
> 108' finished with non-zero exit value 137*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get 
> the stack trace. Run with --info or --debug option to get more log output. 
> Run with --scan to get full insights.*16:22:16* 
> ==*16:22:16*
>  *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor 
> 110' finished with non-zero exit value 1*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185114
 ]

ASF GitHub Bot logged work on BEAM-6418:


Author: ASF GitHub Bot
Created on: 15/Jan/19 04:38
Start Date: 15/Jan/19 04:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7512: [BEAM-6418] Lower 
memory consumption of Flink integration tests
URL: https://github.com/apache/beam/pull/7512
 
 
   1) The portability integration tests bring up Flink clusters and the
   FnHarness. Ensure that not multiple instances of those exist in parallel.
   
   2) If not specified, set 2 as the default parallelism for all tests
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185114)
Time Spent: 1.5h  (was: 1h 20m)

> beam_PreCommit_Java_Cron failing
> 
>
> Key: BEAM-6418
> URL: https://issues.apache.org/jira/browse/BEAM-6418
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Ahmet Altay
>Assignee: Maximilian Michels
>Priority: Critical
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console]
>  
> 

[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185112
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 15/Jan/19 04:34
Start Date: 15/Jan/19 04:34
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7506: [BEAM-6430] 
Fix EXCEPT 
URL: https://github.com/apache/beam/pull/7506#discussion_r247761532
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -17,6 +17,7 @@
  */
 package org.apache.beam.sdk.extensions.sql.impl.transform;
 
+import com.google.common.collect.Iterators;
 
 Review comment:
   using vendored Guava now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185112)
Time Spent: 50m  (was: 40m)

> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6432) Set dependent libraries' versions for the starter archetype automatically

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6432?focusedWorklogId=185111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185111
 ]

ASF GitHub Bot logged work on BEAM-6432:


Author: ASF GitHub Bot
Created on: 15/Jan/19 04:26
Start Date: 15/Jan/19 04:26
Worklog Time Spent: 10m 
  Work Description: sekikn commented on pull request #7511: [BEAM-6432] Set 
dependent libraries' versions for the starter archetype automatically
URL: https://github.com/apache/beam/pull/7511
 
 
   For now, users have to replace the placeholders in pom.xml generated by 
beam-sdks-java-maven-archetypes-starter themselves.
   This PR makes it to be replaced automatically, just like 
beam-sdks-java-maven-archetypes-examples does.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Created] (BEAM-6432) Set dependent libraries' versions for the starter archetype automatically

2019-01-14 Thread Kengo Seki (JIRA)
Kengo Seki created BEAM-6432:


 Summary: Set dependent libraries' versions for the starter 
archetype automatically
 Key: BEAM-6432
 URL: https://issues.apache.org/jira/browse/BEAM-6432
 Project: Beam
  Issue Type: Improvement
  Components: examples-java
Reporter: Kengo Seki
Assignee: Kengo Seki


I generated an empty project from beam-sdks-java-maven-archetypes-starter and 
found that I had to replace the placeholders for dependency versions 
({{@...version@}}) with concrete values myself.
It'd be convenient for users if they were automatically replaced, just like 
beam-sdks-java-maven-archetypes-examples do.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185100
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 15/Jan/19 03:41
Start Date: 15/Jan/19 03:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7506: 
[BEAM-6430] Fix EXCEPT 
URL: https://github.com/apache/beam/pull/7506#discussion_r247755699
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -17,6 +17,7 @@
  */
 package org.apache.beam.sdk.extensions.sql.impl.transform;
 
+import com.google.common.collect.Iterators;
 
 Review comment:
   Use vendored Guava here too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185100)
Time Spent: 40m  (was: 0.5h)

> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6404) FnAPI translation error

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6404?focusedWorklogId=185097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185097
 ]

ASF GitHub Bot logged work on BEAM-6404:


Author: ASF GitHub Bot
Created on: 15/Jan/19 03:31
Start Date: 15/Jan/19 03:31
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #7456: [BEAM-6404] Fix 
issue with side inputs and flatten encoding.
URL: https://github.com/apache/beam/pull/7456#discussion_r247754156
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner_transforms.py
 ##
 @@ -591,7 +591,6 @@ def sink_flattens(stages, pipeline_context):
 if pcollections[pcoll_in].coder_id != output_coder_id:
   # Flatten inputs must all be written with the same coder as is
 
 Review comment:
   I think this comment would be better like this:
   `Flatten requires that all its inputs use the same coder as its output. Add 
stages to transcode Flatten inputs that do not use the same coder.`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185097)
Time Spent: 0.5h  (was: 20m)

> FnAPI translation error
> ---
>
> Key: BEAM-6404
> URL: https://issues.apache.org/jira/browse/BEAM-6404
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  {code:java}
> def run(argv=None):
>   parser = argparse.ArgumentParser()
>   _, pipeline_args = parser.parse_known_args(argv)
>   options = pipeline_options.PipelineOptions(pipeline_args)
>   numbers = [1, 2]
>   with beam.Pipeline(options=options) as p:
> sum_1 = (p
>  | 'ReadNumber1' >> transforms.Create(numbers)
>  | 'CalculateSum1' >> beam.CombineGlobally(fn_sum))
> sum_2 = (p
>  | 'ReadNumber2' >> transforms.Create(numbers)
>  | beam.ParDo(_copy_number, pvalue.AsSingleton(sum_1))
>  | 'CalculateSum2' >> beam.CombineGlobally(fn_sum))
> _ = ((sum_1, sum_2)
>  | beam.Flatten()
>  | 'CalculateSum3' >> beam.CombineGlobally(fn_sum)
>  | beam.io.WriteToText('out.txt'))
> run()
> {code}
>  
> fails with 
> KeyError: u'ref_Coder_FastPrimitivesCoder_4_windowed'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185099
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 03:33
Start Date: 15/Jan/19 03:33
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7504: 
[BEAM-6427] INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247754564
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -17,6 +17,7 @@
  */
 package org.apache.beam.sdk.extensions.sql.impl.transform;
 
+import com.google.common.collect.Iterators;
 
 Review comment:
   Yea, we depend on non-Vendored Guava as well, because `ValuesRel` requires 
an `ImmutableList` to a protected constructor. Other place we should not use it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185099)
Time Spent: 1h 20m  (was: 1h 10m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-14 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742715#comment-16742715
 ] 

Kenneth Knowles commented on BEAM-6352:
---

One approach to giving access to the delegate in a semi-disciplined way would 
be to add something like a self-type so that the type can be 
{{RestrictionTracker>}}. Then the user can get 
access to the more-specific RestrictionTracker type. But since the whole point 
of the wrapper is not having this special access, it doesn't seem like an 
appropriate solution.

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Boyuan Zhang
>Priority: Blocker
> Fix For: 2.10.0
>
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185095
 ]

ASF GitHub Bot logged work on BEAM-6418:


Author: ASF GitHub Bot
Created on: 15/Jan/19 03:29
Start Date: 15/Jan/19 03:29
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #7510: [BEAM-6418] Execute 
Flink tests serially to avoid memory issues
URL: https://github.com/apache/beam/pull/7510#issuecomment-454255003
 
 
   IMO, there is no urgency to enable these tests again and we can work on a 
permanent solution instead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185095)
Time Spent: 1h 20m  (was: 1h 10m)

> beam_PreCommit_Java_Cron failing
> 
>
> Key: BEAM-6418
> URL: https://issues.apache.org/jira/browse/BEAM-6418
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Ahmet Altay
>Assignee: Maximilian Michels
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console]
>  
> *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor 
> 108' finished with non-zero exit value 137*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get 
> the stack trace. Run with --info or --debug option to get more log output. 
> Run with --scan to get full insights.*16:22:16* 
> ==*16:22:16*
>  *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor 
> 110' finished with non-zero exit value 1*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-14 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742709#comment-16742709
 ] 

Kenneth Knowles commented on BEAM-6352:
---

I think SDF is too immature to start with hacks like exposing the delegate or 
special-casing Watch. If this doesn't fit in cleanly, there's a redesign needed.

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Boyuan Zhang
>Priority: Blocker
> Fix For: 2.10.0
>
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-14 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742705#comment-16742705
 ] 

Kenneth Knowles commented on BEAM-6352:
---

Summarizing some thoughts here. The idea of allowing a subclass of 
RestrictionTracker is twofold:

1. Something has to actually implement it (RestrictionTracker should be an 
interface)
2. Each kind of RestrictionTracker may generally have special methods.

I think part of the idea of changing number 2 is that the user might actually 
mess up the implementation. Also that the runner wants to wrap the user's 
RestrictionTracker class in its own that adds synchronization as necessary.

So I wonder if the GrowthTracker extra methods do seem like subverting the main 
methods or if they can be implemented another way. The only other place that 
user types can show up is RestrictionT and PositionT. So perhaps the extra 
methods can go here. I am really not expert in any of this. I am just going on 
what I can read in PRs and the mailing list. I haven't really followed the SDF 
details.

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Boyuan Zhang
>Priority: Blocker
> Fix For: 2.10.0
>
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185093
 ]

ASF GitHub Bot logged work on BEAM-6418:


Author: ASF GitHub Bot
Created on: 15/Jan/19 03:12
Start Date: 15/Jan/19 03:12
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7496: [BEAM-6418] Avoid Jenkins 
memory problems for Flink build targets
URL: https://github.com/apache/beam/pull/7496#issuecomment-454252353
 
 
   @aaltay Serial execution is possible, I have created a PR: #7510. Ideally, I 
don't want to increase the PreCommit build time too much due to serial 
execution, but since other tests can execute in the meantime it might be ok. 
I'm working on reducing the memory pressure of the tests so we can run them in 
parallel again.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185093)
Time Spent: 1h 10m  (was: 1h)

> beam_PreCommit_Java_Cron failing
> 
>
> Key: BEAM-6418
> URL: https://issues.apache.org/jira/browse/BEAM-6418
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Ahmet Altay
>Assignee: Maximilian Michels
>Priority: Critical
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console]
>  
> *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor 
> 108' finished with non-zero exit value 137*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get 
> the stack trace. Run with --info or --debug option to get more log output. 
> Run with --scan to get full insights.*16:22:16* 
> ==*16:22:16*
>  *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor 
> 110' finished with non-zero exit value 1*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=185092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185092
 ]

ASF GitHub Bot logged work on BEAM-6418:


Author: ASF GitHub Bot
Created on: 15/Jan/19 03:08
Start Date: 15/Jan/19 03:08
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #7510: [BEAM-6418] 
Execute Flink tests serially to avoid memory issues
URL: https://github.com/apache/beam/pull/7510
 
 
   We had previously disabled Flink tests for 1.6 and 1.7 due to memory issues 
when
   they execute in parallel. This lets them execute one after another.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185092)
Time Spent: 1h  (was: 50m)

> beam_PreCommit_Java_Cron failing
> 
>
> Key: BEAM-6418
> URL: https://issues.apache.org/jira/browse/BEAM-6418
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Ahmet Altay
>Assignee: Maximilian Michels
>Priority: Critical
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console]
>  
> *16:22:16* 1: Task failed with an exception.*16:22:16* 

[jira] [Work logged] (BEAM-6290) Make the schema for BQ tables storing metric results more generic (JAVA)

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6290?focusedWorklogId=185087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185087
 ]

ASF GitHub Bot logged work on BEAM-6290:


Author: ASF GitHub Bot
Created on: 15/Jan/19 02:47
Start Date: 15/Jan/19 02:47
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #7458: [BEAM-6290] 
Generic schema for metrics 
URL: https://github.com/apache/beam/pull/7458#discussion_r247747716
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java
 ##
 @@ -81,36 +87,46 @@
* Runs the load test, collects and publishes test results to various data 
store and/or console.
*/
   public PipelineResult run() throws IOException {
-long testStartTime = System.currentTimeMillis();
+Timestamp timestamp = Timestamp.now();
 
 loadTest();
 
 PipelineResult result = pipeline.run();
 result.waitUntilFinish();
 
-LoadTestResult testResult = LoadTestResult.create(result, 
metricsNamespace, testStartTime);
+String testId = UUID.randomUUID().toString();
+List metrics = readMetrics(timestamp, result, testId);
 
-ConsoleResultPublisher.publish(testResult);
+ConsoleResultPublisher.publish(metrics, testId, timestamp.toString());
 
 if (options.getPublishToBigQuery()) {
-  publishResultToBigQuery(testResult);
+  publishResultsToBigQuery(metrics);
 }
 return result;
   }
 
-  private void publishResultToBigQuery(LoadTestResult testResult) {
+  private List readMetrics(Timestamp timestamp, 
PipelineResult result, String testId) {
+MetricsReader reader = new MetricsReader(result, metricsNamespace);
+
+NamedTestResult runtime = NamedTestResult
+.create(testId, timestamp.toString(), "runtime_sec",
+(reader.getEndTimeMetric("runtime") - 
reader.getStartTimeMetric("runtime")) / 1000D);
+
+NamedTestResult totalBytes = NamedTestResult
+.create(testId, timestamp.toString(), "total_bytes_count",
+reader.getCounterMetric("totalBytes.count"));
+
+return Arrays.asList(runtime, totalBytes);
+  }
+
+  private void publishResultsToBigQuery(List testResults) {
 
 Review comment:
   Type should be `List`, since you're using the schema from 
that class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185087)
Time Spent: 1h 10m  (was: 1h)

> Make the schema for BQ tables storing metric results more generic (JAVA)
> 
>
> Key: BEAM-6290
> URL: https://issues.apache.org/jira/browse/BEAM-6290
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, we keep the metrics results in BQ in tables with a schema like 
> this: 
> timestamp | total_bytes | run_time | (possibly other BQ columns)
> every time we want to add a new column the schema has to be extended. This is 
> not convenient given the fact that any load test can have different metrics 
> stored. This in turn would cause multiple BQ tables each queried differently. 
> We can provide a more generic schema, like so: 
> test_id | timestamp | metric | value
> thanks to that, every metric, whatever it's name is, can be saved in the 
> table as a separate row. This gives more elasticity in storing metrics and is 
> still easy to query and plot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=185077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185077
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 15/Jan/19 01:51
Start Date: 15/Jan/19 01:51
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #7445: [BEAM-5319] 
Python 3 port runners module
URL: https://github.com/apache/beam/pull/7445
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185077)
Time Spent: 6h 20m  (was: 6h 10m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185076
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 01:51
Start Date: 15/Jan/19 01:51
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247740393
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -17,6 +17,7 @@
  */
 package org.apache.beam.sdk.extensions.sql.impl.transform;
 
+import com.google.common.collect.Iterators;
 
 Review comment:
   If I didn't miss anything we should use vendored guava: 
https://lists.apache.org/thread.html/6cdeb1f7cf6037c84a249268f641fa1caa52a721af017a1b643d60fa@%3Cdev.beam.apache.org%3E
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185076)
Time Spent: 1h 10m  (was: 1h)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK

2019-01-14 Thread Alex Amato (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Amato reassigned BEAM-6431:


Assignee: Alex Amato

> Add ExecutionTime metrics to the Beam Java SDK
> --
>
> Key: BEAM-6431
> URL: https://issues.apache.org/jira/browse/BEAM-6431
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
>
> This will be done by using the Dataflow worker's StateSampler code. I have 
> put together a refactoring plan
> [here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#]
> This will include estimating the processing time for the start, process and 
> finish bundle. The python SDK already has an implementation of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6431) Add ExecutionTime metrics to the Beam Java SDK

2019-01-14 Thread Alex Amato (JIRA)
Alex Amato created BEAM-6431:


 Summary: Add ExecutionTime metrics to the Beam Java SDK
 Key: BEAM-6431
 URL: https://issues.apache.org/jira/browse/BEAM-6431
 Project: Beam
  Issue Type: New Feature
  Components: java-fn-execution
Reporter: Alex Amato


This will be done by using the Dataflow worker's StateSampler code. I have put 
together a refactoring plan
[here|https://docs.google.com/document/d/1OlAJf4T_CTL9WRH8lP8uQOfLjWYfm8IpRXSe38g34k4/edit#]

This will include estimating the processing time for the start, process and 
finish bundle. The python SDK already has an implementation of this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185053
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 15/Jan/19 00:30
Start Date: 15/Jan/19 00:30
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7506: [BEAM-6430] Fix 
EXCEPT 
URL: https://github.com/apache/beam/pull/7506#issuecomment-454218691
 
 
   add comment to explain what EXCEPT ALL should behave.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185053)
Time Spent: 0.5h  (was: 20m)

> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185049
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 00:24
Start Date: 15/Jan/19 00:24
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247721252
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) {
 case INTERSECT:
   if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) 
{
 if (all) {
-  for (Row leftRow : leftRows) {
-ctx.output(leftRow);
+  Iterator iter = leftRows.iterator();
+  int leftCount = 0;
+  int rightCount = 0;
+  while (iter.hasNext()) {
+iter.next();
+leftCount++;
+  }
 
 Review comment:
   Thanks! Using Guava now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185049)
Time Spent: 50m  (was: 40m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)

2019-01-14 Thread Heejong Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-6391.
---
   Resolution: Fixed
Fix Version/s: 2.10.0

> small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing 
> raise)
> -
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)

2019-01-14 Thread Heejong Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742608#comment-16742608
 ] 

Heejong Lee commented on BEAM-6391:
---

new title sounds scarier ...

> small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing 
> raise)
> -
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition,

2019-01-14 Thread Heejong Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-6391:
--
Summary: small fixes in Python SDK (ArrayOutOfIndex, always true condition, 
  (was: fix miscellaneous bugs)

> small fixes in Python SDK (ArrayOutOfIndex, always true condition, 
> ---
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6391) small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing raise)

2019-01-14 Thread Heejong Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-6391:
--
Summary: small fixes in Python SDK (ArrayOutOfIndex, always true condition, 
missing raise)  (was: small fixes in Python SDK (ArrayOutOfIndex, always true 
condition, )

> small fixes in Python SDK (ArrayOutOfIndex, always true condition, missing 
> raise)
> -
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4076) Schema followups

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=185039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185039
 ]

ASF GitHub Bot logged work on BEAM-4076:


Author: ASF GitHub Bot
Created on: 15/Jan/19 00:08
Start Date: 15/Jan/19 00:08
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7500: 
[BEAM-4076] Add antlr4 to beam
URL: https://github.com/apache/beam/pull/7500#discussion_r247714307
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -51,9 +58,11 @@ test {
 }
 
 dependencies {
+  antlr "org.antlr:antlr4:4.7"
 
 Review comment:
   This should be `library.java.antlr`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185039)
Time Spent: 16h 50m  (was: 16h 40m)

> Schema followups
> 
>
> Key: BEAM-4076
> URL: https://issues.apache.org/jira/browse/BEAM-4076
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, dsl-sql, sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> This umbrella bug contains subtasks with followups for Beam schemas, which 
> were moved from SQL to the core Java SDK and made to be type-name-based 
> rather than coder based.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4076) Schema followups

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=185040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185040
 ]

ASF GitHub Bot logged work on BEAM-4076:


Author: ASF GitHub Bot
Created on: 15/Jan/19 00:08
Start Date: 15/Jan/19 00:08
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7500: 
[BEAM-4076] Add antlr4 to beam
URL: https://github.com/apache/beam/pull/7500#discussion_r247714253
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -51,9 +58,11 @@ test {
 }
 
 dependencies {
+  antlr "org.antlr:antlr4:4.7"
   // Required to load constants from the model, e.g. max timestamp for global 
window
   shadow project(path: ":beam-model-pipeline", configuration: "shadow")
   shadow library.java.vendored_guava_20_0
+  compile library.java.antlr
 
 Review comment:
   This should be `antlr_runtime`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185040)
Time Spent: 17h  (was: 16h 50m)

> Schema followups
> 
>
> Key: BEAM-4076
> URL: https://issues.apache.org/jira/browse/BEAM-4076
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, dsl-sql, sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> This umbrella bug contains subtasks with followups for Beam schemas, which 
> were moved from SQL to the core Java SDK and made to be type-name-based 
> rather than coder based.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185037
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 00:04
Start Date: 15/Jan/19 00:04
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247713398
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) {
 case INTERSECT:
   if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) 
{
 if (all) {
-  for (Row leftRow : leftRows) {
-ctx.output(leftRow);
+  Iterator iter = leftRows.iterator();
+  int leftCount = 0;
+  int rightCount = 0;
+  while (iter.hasNext()) {
+iter.next();
+leftCount++;
+  }
 
 Review comment:
   I think guava has `Iterators.size(iterator)`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185037)
Time Spent: 0.5h  (was: 20m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185038
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 15/Jan/19 00:04
Start Date: 15/Jan/19 00:04
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#discussion_r247713579
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java
 ##
 @@ -80,8 +80,23 @@ public void processElement(ProcessContext ctx) {
 case INTERSECT:
   if (leftRows.iterator().hasNext() && rightRows.iterator().hasNext()) 
{
 if (all) {
-  for (Row leftRow : leftRows) {
-ctx.output(leftRow);
+  Iterator iter = leftRows.iterator();
+  int leftCount = 0;
+  int rightCount = 0;
+  while (iter.hasNext()) {
+iter.next();
+leftCount++;
+  }
+  iter = rightRows.iterator();
+  while (iter.hasNext()) {
+iter.next();
+rightCount++;
+  }
+
+  // output MIN(m, n)
+  iter = (leftCount <= rightCount) ? leftRows.iterator() : 
rightRows.iterator();
+  while (iter.hasNext()) {
 
 Review comment:
   Looking at this I am not sure I understand how SQL `INTERSECT` is supposed 
to work, can you add a brief explanation comment?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185038)
Time Spent: 40m  (was: 0.5h)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185035
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:57
Start Date: 14/Jan/19 23:57
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7506: [BEAM-6430] Fix 
EXCEPT 
URL: https://github.com/apache/beam/pull/7506#issuecomment-454209673
 
 
   @kennknowles @akedin @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185035)
Time Spent: 20m  (was: 10m)

> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=185032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185032
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:53
Start Date: 14/Jan/19 23:53
Worklog Time Spent: 10m 
  Work Description: sduskis commented on pull request #7367: [BEAM-3342] 
Create a Cloud Bigtable Python connector Write
URL: https://github.com/apache/beam/pull/7367#discussion_r247711422
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigtable_io.py
 ##
 @@ -0,0 +1,121 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""BigTable connector
+
+This module implements writing to BigTable tables.
+The default mode is to set row data to write to BigTable tables.
+The syntax supported is described here:
+https://cloud.google.com/bigtable/docs/quickstart-cbt
+
+BigTable connector can be used as main outputs. A main output
+(common case) is expected to be massive and will be split into
+manageable chunks and processed in parallel. In the example below
+we created a list of rows then passed to the GeneratedDirectRows
+DoFn to set the Cells and then we call the WriteToBigtable to insert
+those generated rows in the table.
+
+  main_table = (p
+   | 'Generate Row Values' >> beam.Create(row_values)
+   | 'Generate Direct Rows' >> beam.ParDo(GenerateDirectRows())
+   | 'Write to BT' >> beam.ParDo(WriteToBigtable(beam_options)))
+"""
+from __future__ import absolute_import
+
+import apache_beam as beam
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.transforms.display import DisplayDataItem
+
+try:
+  from google.cloud.bigtable import Client
+  from google.cloud.bigtable.batcher import MutationsBatcher
+except ImportError:
+  pass
+
+
+class WriteToBigtable(beam.DoFn):
+  """ Creates the connector can call and add_row to the batcher using each
+  row in beam pipe line
+
+  :type beam_options: class:`~bigtable_configuration.BigtableConfiguration`
+  :param beam_options: class `~bigtable_configuration.BigtableConfiguration`
+  """
+
+  def __init__(self, beam_options):
+super(WriteToBigtable, self).__init__(beam_options)
+self.beam_options = beam_options
+self.table = None
+self.batcher = None
+self.written = Metrics.counter(self.__class__, 'Written Row')
+
+  def start_bundle(self):
+if self.table is None:
+  client = Client(project=self.beam_options.project_id)
+  instance = client.instance(self.beam_options.instance_id)
+  self.table = instance.table(self.beam_options.table_id)
+  self.batcher = MutationsBatcher(self.table)
 
 Review comment:
   please unindent `self.batch = `
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185032)
Time Spent: 7.5h  (was: 7h 20m)

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5592) Beam Dependency Update Request: com.diffplug.spotless:spotless-plugin-gradle

2019-01-14 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5592:
-

Assignee: Kenneth Knowles

> Beam Dependency Update Request: com.diffplug.spotless:spotless-plugin-gradle
> 
>
> Key: BEAM-5592
> URL: https://issues.apache.org/jira/browse/BEAM-5592
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kenneth Knowles
>Priority: Major
>
>  - 2018-10-01 19:32:35.663453 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.6.0. The latest version is 3.15.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:21:21.486670 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.6.0. The latest version is 3.15.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-15 12:14:25.257832 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.6.0. The latest version is 3.15.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-22 12:14:56.122481 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.6.0. The latest version is 3.15.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-29 12:20:00.647717 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.7.0. The latest version is 3.15.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-05 12:17:24.553692 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.7.0. The latest version is 3.16.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-12 12:17:11.816394 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.7.0. The latest version is 3.16.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-19 12:17:59.751820 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.7.0. The latest version is 3.16.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:16:57.363678 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.7.0. The latest version is 3.16.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-12-03 12:17:19.561449 
> -
> Please consider upgrading the dependency 
> com.diffplug.spotless:spotless-plugin-gradle. 
> The current version is 3.7.0. The latest version is 3.16.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  

[jira] [Work logged] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?focusedWorklogId=185030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185030
 ]

ASF GitHub Bot logged work on BEAM-6430:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:44
Start Date: 14/Jan/19 23:44
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7506: [BEAM-6430] 
Fix EXCEPT 
URL: https://github.com/apache/beam/pull/7506
 
 
   EXCEPT ALL should emit MAX(m - n, 0).
   
   and EXCEPT DISTINCT emit dedupped EXCEPT ALL result
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185030)
Time Spent: 10m
Remaining Estimate: 0h

> EXCEPT is not compatible with 

[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185026
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:40
Start Date: 14/Jan/19 23:40
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #7504: [BEAM-6427] 
INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504#issuecomment-454206064
 
 
   @akedin @kennknowles @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185026)
Time Spent: 20m  (was: 10m)

> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5009?focusedWorklogId=185025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185025
 ]

ASF GitHub Bot logged work on BEAM-5009:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:38
Start Date: 14/Jan/19 23:38
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7505: 
[BEAM-5009] Pin spotless and googleJavaFormat to latest; apply globally
URL: https://github.com/apache/beam/pull/7505
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185025)
Time Spent: 50m  (was: 40m)

> Beam Dependency Update Request: com.diffplug.spotless
> -
>
> Key: BEAM-5009
> URL: https://issues.apache.org/jira/browse/BEAM-5009
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:34:42.078359
> Please review and upgrade the com.diffplug.spotless to the latest 
> version None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-14 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742583#comment-16742583
 ] 

Kenneth Knowles commented on BEAM-6352:
---

OK the problem was my reading of the PR description. I see now.

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Boyuan Zhang
>Priority: Blocker
> Fix For: 2.10.0
>
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=185021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185021
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:31
Start Date: 14/Jan/19 23:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #7445: [BEAM-5319] 
Python 3 port runners module
URL: https://github.com/apache/beam/pull/7445#discussion_r247706878
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -215,9 +203,6 @@ def test_gbk_side_input(self):
   main | beam.Map(lambda a, b: (a, b), beam.pvalue.AsDict(side)),
   equal_to([(None, {'a': [1]})]))
 
-  @unittest.skipIf(sys.version_info[0] == 3 and
 
 Review comment:
   Ok, that's fine. I created a JIRA tracking this failure: 
https://issues.apache.org/jira/browse/BEAM-6429.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185021)
Time Spent: 6h 10m  (was: 6h)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6430 started by Rui Wang.
--
> EXCEPT is not compatible with SQL standard
> --
>
> Key: BEAM-6430
> URL: https://issues.apache.org/jira/browse/BEAM-6430
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5446?focusedWorklogId=185020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185020
 ]

ASF GitHub Bot logged work on BEAM-5446:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:28
Start Date: 14/Jan/19 23:28
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #6467: 
[BEAM-5446] SplittableDoFn: Remove "internal" methods for public API surface
URL: https://github.com/apache/beam/pull/6467#discussion_r247706235
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
 ##
 @@ -610,11 +591,9 @@ private static void verifySplittableMethods(DoFnSignature 
signature, ErrorReport
 errors.forMethod(DoFn.GetInitialRestriction.class, 
getInitialRestriction.targetMethod());
 TypeDescriptor restrictionT = getInitialRestriction.restrictionT();
 processElementErrors.checkArgument(
-processElement.trackerT().equals(trackerT),
-"Has tracker type %s, but the DoFn's tracker type was inferred as %s 
from %s",
-formatType(processElement.trackerT()),
-trackerT,
-originOfTrackerT);
+
processElement.trackerT().getRawType().equals(RestrictionTracker.class),
 
 Review comment:
   Ah, nevermind. My bad reading.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185020)
Time Spent: 2h 10m  (was: 2h)

> SplittableDoFn: Remove runner time execution information from public API 
> surface
> 
>
> Key: BEAM-5446
> URL: https://issues.apache.org/jira/browse/BEAM-5446
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5009?focusedWorklogId=185019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185019
 ]

ASF GitHub Bot logged work on BEAM-5009:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:27
Start Date: 14/Jan/19 23:27
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #7505: [BEAM-5009] Pin 
spotless and googleJavaFormat to latest; apply globally
URL: https://github.com/apache/beam/pull/7505#issuecomment-454203222
 
 
   lgtm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185019)
Time Spent: 0.5h  (was: 20m)

> Beam Dependency Update Request: com.diffplug.spotless
> -
>
> Key: BEAM-5009
> URL: https://issues.apache.org/jira/browse/BEAM-5009
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 2018-07-25 20:34:42.078359
> Please review and upgrade the com.diffplug.spotless to the latest 
> version None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5446?focusedWorklogId=185018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185018
 ]

ASF GitHub Bot logged work on BEAM-5446:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:27
Start Date: 14/Jan/19 23:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #6467: 
[BEAM-5446] SplittableDoFn: Remove "internal" methods for public API surface
URL: https://github.com/apache/beam/pull/6467#discussion_r247706055
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
 ##
 @@ -610,11 +591,9 @@ private static void verifySplittableMethods(DoFnSignature 
signature, ErrorReport
 errors.forMethod(DoFn.GetInitialRestriction.class, 
getInitialRestriction.targetMethod());
 TypeDescriptor restrictionT = getInitialRestriction.restrictionT();
 processElementErrors.checkArgument(
-processElement.trackerT().equals(trackerT),
-"Has tracker type %s, but the DoFn's tracker type was inferred as %s 
from %s",
-formatType(processElement.trackerT()),
-trackerT,
-originOfTrackerT);
+
processElement.trackerT().getRawType().equals(RestrictionTracker.class),
 
 Review comment:
   (I know the PR description says that it makes this change, but I don't 
understand why)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185018)
Time Spent: 2h  (was: 1h 50m)

> SplittableDoFn: Remove runner time execution information from public API 
> surface
> 
>
> Key: BEAM-5446
> URL: https://issues.apache.org/jira/browse/BEAM-5446
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5446) SplittableDoFn: Remove runner time execution information from public API surface

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5446?focusedWorklogId=185017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185017
 ]

ASF GitHub Bot logged work on BEAM-5446:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:26
Start Date: 14/Jan/19 23:26
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #6467: 
[BEAM-5446] SplittableDoFn: Remove "internal" methods for public API surface
URL: https://github.com/apache/beam/pull/6467#discussion_r247705762
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/reflect/DoFnSignatures.java
 ##
 @@ -610,11 +591,9 @@ private static void verifySplittableMethods(DoFnSignature 
signature, ErrorReport
 errors.forMethod(DoFn.GetInitialRestriction.class, 
getInitialRestriction.targetMethod());
 TypeDescriptor restrictionT = getInitialRestriction.restrictionT();
 processElementErrors.checkArgument(
-processElement.trackerT().equals(trackerT),
-"Has tracker type %s, but the DoFn's tracker type was inferred as %s 
from %s",
-formatType(processElement.trackerT()),
-trackerT,
-originOfTrackerT);
+
processElement.trackerT().getRawType().equals(RestrictionTracker.class),
 
 Review comment:
   This seems like it should be a subtype check. What was wrong with the prior 
code?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185017)
Time Spent: 1h 50m  (was: 1h 40m)

> SplittableDoFn: Remove runner time execution information from public API 
> surface
> 
>
> Key: BEAM-5446
> URL: https://issues.apache.org/jira/browse/BEAM-5446
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Move the setting of "claim observers" within RestrictionTracker to another 
> location to clean up the RestrictionTracker interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-14 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742581#comment-16742581
 ] 

Kenneth Knowles commented on BEAM-6352:
---

The change seems bad. I don't really know why it was made.

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Boyuan Zhang
>Priority: Blocker
> Fix For: 2.10.0
>
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185012
 ]

ASF GitHub Bot logged work on BEAM-6391:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:13
Start Date: 14/Jan/19 23:13
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7430: [BEAM-6391] fix 
always true conditions
URL: https://github.com/apache/beam/pull/7430#issuecomment-454199847
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185012)
Time Spent: 40m  (was: 0.5h)

> fix miscellaneous bugs
> --
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6427 started by Rui Wang.
--
> INTERSECT ALL is not compatible with SQL standard
> -
>
> Key: BEAM-6427
> URL: https://issues.apache.org/jira/browse/BEAM-6427
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> say Row R appears m times on one side and n times on another side. 
> Beam's  INTERSECT ALL is implemented to return MAX(m, n).
> And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6430) EXCEPT is not compatible with SQL standard

2019-01-14 Thread Rui Wang (JIRA)
Rui Wang created BEAM-6430:
--

 Summary: EXCEPT is not compatible with SQL standard
 Key: BEAM-6430
 URL: https://issues.apache.org/jira/browse/BEAM-6430
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang


EXCEPT ALL should return MAX(m - n, 0). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5009?focusedWorklogId=185015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185015
 ]

ASF GitHub Bot logged work on BEAM-5009:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:22
Start Date: 14/Jan/19 23:22
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #7505: 
[BEAM-5009] Pin spotless and googleJavaFormat to latest; apply globally
URL: https://github.com/apache/beam/pull/7505
 
 
   Our spotless plugin was just using the latest googleJavaFormat or whatever 
was on disk. So new devs would get a new version and our codebase would break, 
actually. This pins the version and also updates everything.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (BEAM-5009) Beam Dependency Update Request: com.diffplug.spotless

2019-01-14 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-5009:
-

Assignee: Kenneth Knowles

> Beam Dependency Update Request: com.diffplug.spotless
> -
>
> Key: BEAM-5009
> URL: https://issues.apache.org/jira/browse/BEAM-5009
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:34:42.078359
> Please review and upgrade the com.diffplug.spotless to the latest 
> version None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185010
 ]

ASF GitHub Bot logged work on BEAM-6391:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:08
Start Date: 14/Jan/19 23:08
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7428: 
[BEAM-6391] add missing raise keywords
URL: https://github.com/apache/beam/pull/7428
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185010)
Time Spent: 0.5h  (was: 20m)

> fix miscellaneous bugs
> --
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6429) apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input fails in Python 3.6

2019-01-14 Thread Valentyn Tymofieiev (JIRA)
Valentyn Tymofieiev created BEAM-6429:
-

 Summary: 
apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input
 fails in Python 3.6 
 Key: BEAM-6429
 URL: https://issues.apache.org/jira/browse/BEAM-6429
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev


{noformat}
ERROR: test_multimap_side_input 
(apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest)
--
Traceback (most recent call last):
 File 
"/beam/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", line 
230, in test_multimap_side_input
 equal_to([('a', [1, 3]), ('b', [2])]))
 File "/beam/sdks/python/apache_beam/pipeline.py", line 425, in __exit__
 self.run().wait_until_finish()
 File "/beam/sdks/python/apache_beam/pipeline.py", line 405, in run
 self._options).run(False)
 File "/beam/sdks/python/apache_beam/pipeline.py", line 418, in run
 return self.runner.run_pipeline(self, self._options)
 File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
line 265, in run_pipeline
 default_environment=self._default_environment))
 File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
line 268, in run_via_runner_api
 return self.run_stages(*self.create_stages(pipeline_proto))
 File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
line 355, in run_stages
 safe_coders)
 File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
line 449, in run_stage
 elements_by_window = _WindowGroupingBuffer(si, value_coder)
 File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
line 191, in __init__
 self._key_coder = coder.wrapped_value_coder.key_coder()
 File "/beam/sdks/python/apache_beam/coders/coders.py", line 177, in key_coder
 raise ValueError('Not a KV coder: %s.' % self)
ValueError: Not a KV coder: BytesCoder.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6427?focusedWorklogId=185011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185011
 ]

ASF GitHub Bot logged work on BEAM-6427:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:13
Start Date: 14/Jan/19 23:13
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #7504: 
[BEAM-6427]INTERSECT ALL is not compatible with SQL standard.
URL: https://github.com/apache/beam/pull/7504
 
 
   Per SQL standard 1999 `7.12 3) iii) 3)`, 
   
   For ALL case, 
   ```
If INTERSECT is specified, then the number of duplicates of R that T 
contains is
   the minimum of m and n. 
   ```
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185013
 ]

ASF GitHub Bot logged work on BEAM-6391:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:13
Start Date: 14/Jan/19 23:13
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7430: 
[BEAM-6391] fix always true conditions
URL: https://github.com/apache/beam/pull/7430
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185013)
Time Spent: 50m  (was: 40m)

> fix miscellaneous bugs
> --
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6352) Watch PTransform is broken

2019-01-14 Thread Boyuan Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742565#comment-16742565
 ] 

Boyuan Zhang commented on BEAM-6352:


Researching a little bit. The root cause is in Luke's 
PR([https://github.com/apache/beam/pull/6467]), there is a checkArguement 
changes:

{code}
processElementErrors.checkArgument(processElement.trackerT().getRawType().equals(RestrictionTracker.class),
"Has tracker type %s, but the DoFn's tracker type must be of type 
RestrictionTracker.",
formatType(processElement.trackerT()));
{code}

which enforces a DoFn use RestrictionTracker in processElement rather than a 
subtype of RestrictionTracker.
But in the WatchGrowthFn, the usage of tracker is not only limited to 
RestrictionTracker, 
[example|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L727].

There are 2 possible solutions from my knowlagde:
1. Expose the exact tracker from 
[RestrictionTrackerObserver|https://github.com/apache/beam/blob/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/splittabledofn/RestrictionTrackers.java#L42]
2. In the checkArguement section, make WatchGrowthTracker as a special case.

I'm not super familiar with Watch.java so I cannot tell which is the right way 
to go.

 

> Watch PTransform is broken
> --
>
> Key: BEAM-6352
> URL: https://issues.apache.org/jira/browse/BEAM-6352
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.9.0
>Reporter: Gleb Kanterov
>Assignee: Boyuan Zhang
>Priority: Blocker
> Fix For: 2.10.0
>
>
> List of affected tests:
> org.apache.beam.sdk.transforms.WatchTest > 
> testSinglePollMultipleInputsWithSideInput FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithKeyExtractor 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollMultipleInputs FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationDueToTerminationCondition FAILED
> org.apache.beam.sdk.transforms.WatchTest > testMultiplePollsWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > testSinglePollWithManyResults 
> FAILED
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsStopAfterTimeSinceNewOutput 
> org.apache.beam.sdk.transforms.WatchTest > 
> testMultiplePollsWithTerminationBecauseOutputIsFinal FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[0: true] FAILED
> org.apache.beam.sdk.io.AvroIOTest$NeedsRunnerTests > 
> testContinuouslyWriteAndReadMultipleFilepatterns[1: false] FAILED
> org.apache.beam.sdk.io.FileIOTest > testMatchWatchForNewFiles FAILED
> org.apache.beam.sdk.io.TextIOReadTest$BasicIOTest > testReadWatchForNewFiles 
> FAILED
> {code}
> java.lang.IllegalArgumentException: 
> org.apache.beam.sdk.transforms.Watch$WatchGrowthFn, @ProcessElement 
> process(ProcessContext, GrowthTracker): Has tracker type 
> Watch.GrowthTracker, but the DoFn's tracker 
> type must be of type RestrictionTracker.
> {code}
> Relevant pull requests:
> - https://github.com/apache/beam/pull/6467
> - https://github.com/apache/beam/pull/7374
> Now tests are marked with @Ignore referencing this JIRA issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6391) fix miscellaneous bugs

2019-01-14 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742571#comment-16742571
 ] 

Chamikara Jayalath commented on BEAM-6391:
--

Please give a more descriptive name (sounds like a bug we can never close with 
the current title :D )

> fix miscellaneous bugs
> --
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185008
 ]

ASF GitHub Bot logged work on BEAM-6391:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:07
Start Date: 14/Jan/19 23:07
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7429: [BEAM-6391] fix 
possible out of bound exception
URL: https://github.com/apache/beam/pull/7429#issuecomment-454198505
 
 
   LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185008)
Time Spent: 10m
Remaining Estimate: 0h

> fix miscellaneous bugs
> --
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6391) fix miscellaneous bugs

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6391?focusedWorklogId=185009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185009
 ]

ASF GitHub Bot logged work on BEAM-6391:


Author: ASF GitHub Bot
Created on: 14/Jan/19 23:07
Start Date: 14/Jan/19 23:07
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7429: 
[BEAM-6391] fix possible out of bound exception
URL: https://github.com/apache/beam/pull/7429
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185009)
Time Spent: 20m  (was: 10m)

> fix miscellaneous bugs
> --
>
> Key: BEAM-6391
> URL: https://issues.apache.org/jira/browse/BEAM-6391
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ticket for addressing small and easy to fix bugs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=185006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185006
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 14/Jan/19 22:58
Start Date: 14/Jan/19 22:58
Worklog Time Spent: 10m 
  Work Description: juan-rael commented on pull request #7367: [BEAM-3342] 
Create a Cloud Bigtable Python connector Write
URL: https://github.com/apache/beam/pull/7367#discussion_r247695621
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigtable_io_test.py
 ##
 @@ -0,0 +1,174 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unittest for GCP Bigtable testing."""
+from __future__ import absolute_import
+
+import datetime
+import logging
+import random
+import string
+import unittest
+import uuid
+
+import apache_beam as beam
+from apache_beam.io.gcp.bigtable_io_write import BigtableWriteConfiguration
+from apache_beam.io.gcp.bigtable_io_write import WriteToBigtable
+from apache_beam.metrics.metric import MetricsFilter
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.runners.runner import PipelineState
+from apache_beam.testing.test_pipeline import TestPipeline
+
+# Protect against environments where bigtable library is not available.
+# pylint: disable=wrong-import-order, wrong-import-position
+try:
+  from google.cloud.bigtable import row, column_family, Client
+except ImportError:
+  Client = None
+
+
+class GenerateDirectRows(beam.DoFn):
+  """ Generates an iterator of DirectRow object to process on beam pipeline.
+
+  """
+  def process(self, row_values):
+""" Process beam pipeline using an element.
+
+:type row_value: dict
+:param row_value: dict: dict values with row_key and row_content having
+family, column_id and value of row.
+"""
+direct_row = row.DirectRow(row_key=row_values["row_key"])
+
+for row_value in row_values["row_content"]:
+  direct_row.set_cell(row_value["column_family_id"],
+  row_value["column_id"],
+  row_value["value"],
+  datetime.datetime.now())
+
+
+@unittest.skipIf(Client is None, 'GCP Bigtable dependencies are not installed')
+class BigtableIOWriteIT(unittest.TestCase):
+  """ Bigtable Write Connector Test
+
+  """
+  DEFAULT_TABLE_PREFIX = "python-test"
+  instance_id = DEFAULT_TABLE_PREFIX + "-" + str(uuid.uuid4())[:8]
+  cluster_id = DEFAULT_TABLE_PREFIX + "-" + str(uuid.uuid4())[:8]
+  table_id = DEFAULT_TABLE_PREFIX + "-" + str(uuid.uuid4())[:8]
+  number = 500
+  LOCATION_ID = "us-east1-b"
+
+  def setUp(self):
+try:
+  from google.cloud.bigtable import enums
+  self.STORAGE_TYPE = enums.StorageType.HDD
+except ImportError:
+  self.STORAGE_TYPE = 2
+
+self.test_pipeline = TestPipeline(is_integration_test=True)
+self.runner_name = type(self.test_pipeline.runner).__name__
+self.project = self.test_pipeline.get_option('project')
+self.client = Client(project=self.project, admin=True)
+self._create_instance_table()
+
+  def tearDown(self):
+instance = self.client.instance(self.instance_id)
+if instance.exists():
+  instance.delete()
+
+  def test_bigtable_write_python(self):
+number = self.number
+config = BigtableWriteConfiguration(self.project, self.instance_id,
+self.table_id)
+pipeline_args = self.test_pipeline.options_list
+pipeline_options = PipelineOptions(pipeline_args)
+
+row_values = self._generate_mutation_data(number)
+
+with beam.Pipeline(options=pipeline_options) as pipeline:
+  _ = (
+  pipeline
+  | 'Generate Row Values' >> beam.Create(row_values)
 
 Review comment:
   I changed the class `Generate Direct Rows`  to a PTransform.
   ```python
   class GenerateDirectRows(beam.PTransform):
 """ Generates an iterator of DirectRow object to process on beam pipeline.
   
 """
 def __init__(self, number, **kwargs):
   super(GenerateDirectRows, self).__init__(**kwargs)
   self.number = number
   self.rand = 

[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=185000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-185000
 ]

ASF GitHub Bot logged work on BEAM-5315:


Author: ASF GitHub Bot
Created on: 14/Jan/19 22:50
Start Date: 14/Jan/19 22:50
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on pull request #7503: 
[BEAM-5315] Python 3 port io.tfrecordio module
URL: https://github.com/apache/beam/pull/7503
 
 
   This is is part of a series of PRs with goal to make Apache Beam PY3 
compatible. The proposal with the outlined approach has been documented here: 
https://s.apache.org/beam-python-3.
   
   This PR ports the `io.tfrecordio` module.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 185000)
Time Spent: 12.5h  (was: 12h 20m)

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4076) Schema followups

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=184969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184969
 ]

ASF GitHub Bot logged work on BEAM-4076:


Author: ASF GitHub Bot
Created on: 14/Jan/19 22:32
Start Date: 14/Jan/19 22:32
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #7500: [BEAM-4076] 
Add antlr4 to beam
URL: https://github.com/apache/beam/pull/7500
 
 
   Just adds the dependency and the shading.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184969)
Time Spent: 16.5h  (was: 16h 20m)

> Schema followups
> 
>
> Key: BEAM-4076
> URL: https://issues.apache.org/jira/browse/BEAM-4076
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, dsl-sql, sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> This umbrella bug contains subtasks with followups for Beam schemas, which 
> were moved from SQL to the core Java SDK and made to be type-name-based 
> rather than coder based.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6428) Allow textual selection syntax for schema fields

2019-01-14 Thread Reuven Lax (JIRA)
Reuven Lax created BEAM-6428:


 Summary: Allow textual selection syntax for schema fields
 Key: BEAM-6428
 URL: https://issues.apache.org/jira/browse/BEAM-6428
 Project: Beam
  Issue Type: Sub-task
  Components: beam-model
Affects Versions: 2.9.0
Reporter: Reuven Lax
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6427) INTERSECT ALL is not compatible with SQL standard

2019-01-14 Thread Rui Wang (JIRA)
Rui Wang created BEAM-6427:
--

 Summary: INTERSECT ALL is not compatible with SQL standard
 Key: BEAM-6427
 URL: https://issues.apache.org/jira/browse/BEAM-6427
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang


say Row R appears m times on one side and n times on another side. 

Beam's  INTERSECT ALL is implemented to return MAX(m, n).

And SQL1999 standard says INTERSECT ALL should return MIN(m, n).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=184960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184960
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 14/Jan/19 21:46
Start Date: 14/Jan/19 21:46
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on pull request #7445: 
[BEAM-5319] Python 3 port runners module
URL: https://github.com/apache/beam/pull/7445#discussion_r247671233
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -215,9 +203,6 @@ def test_gbk_side_input(self):
   main | beam.Map(lambda a, b: (a, b), beam.pvalue.AsDict(side)),
   equal_to([(None, {'a': [1]})]))
 
-  @unittest.skipIf(sys.version_info[0] == 3 and
 
 Review comment:
   Only unskipping tests for passing versions would mean we would need to check 
the tests for multiple versions every time, which would take some time. Leaving 
all tests unskipped for higher versions also doesn't seem like the best 
approach, since most will be fixed for all versions anyway.
   
   I also think some other already unskipped tests will fail on Python 3.6+. 
For instance some typehints tests related to type inference, which uses byte 
code inspection, which was changed in Python 3.6.
   
   Looking at the fastest way forward, I would like to focus solely on 3.5 for 
now, and add skip statements for 3.6 when a 3.6 test environment is added. What 
do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184960)
Time Spent: 6h  (was: 5h 50m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5319) Finish Python 3 porting for runners module

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5319?focusedWorklogId=184959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184959
 ]

ASF GitHub Bot logged work on BEAM-5319:


Author: ASF GitHub Bot
Created on: 14/Jan/19 21:41
Start Date: 14/Jan/19 21:41
Worklog Time Spent: 10m 
  Work Description: RobbeSneyders commented on issue #7445: [BEAM-5319] 
Python 3 port runners module
URL: https://github.com/apache/beam/pull/7445#issuecomment-454172116
 
 
   I've updated the random seed in the `test_should_sample 
(apache_beam.runners.worker.opcounters_test.OperationCountersTest)` after 
discussion with Robert on 
[BEAM-6359](https://issues.apache.org/jira/browse/BEAM-6395?jql=text%20~%20%22opcounters%22),
 which fixes the last failing runners test for 3.5.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184959)
Time Spent: 5h 50m  (was: 5h 40m)

> Finish Python 3 porting for runners module
> --
>
> Key: BEAM-5319
> URL: https://issues.apache.org/jira/browse/BEAM-5319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Robbe
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3608) Vendor Guava

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3608?focusedWorklogId=184943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184943
 ]

ASF GitHub Bot logged work on BEAM-3608:


Author: ASF GitHub Bot
Created on: 14/Jan/19 20:27
Start Date: 14/Jan/19 20:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7494: [BEAM-3608] Port 
shaded Guava to vendored Guava
URL: https://github.com/apache/beam/pull/7494#issuecomment-454149478
 
 
   I only filed https://issues.apache.org/jira/browse/BEAM-5821 for removing 
shadow plugin. It would be great to start on the easy IO cases now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184943)
Time Spent: 9h 50m  (was: 9h 40m)

> Vendor Guava
> 
>
> Key: BEAM-3608
> URL: https://issues.apache.org/jira/browse/BEAM-3608
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Instead of shading as part of our build, we can shade before build so that it 
> is apparent when reading code, and in IDEs, that a particular class resides 
> in a hidden namespace.
> {{import com.google.common.reflect.TypeToken}}
> becomes something like
> {{import org.apache.beam.private.guava21.com.google.common.reflect.TypeToken}}
> So we can very trivially ban `org.apache.beam.private` from public APIs 
> unless they are annotated {{@Internal}}, and it makes sharing between our own 
> modules never get broken by shading again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3608) Vendor Guava

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3608?focusedWorklogId=184942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184942
 ]

ASF GitHub Bot logged work on BEAM-3608:


Author: ASF GitHub Bot
Created on: 14/Jan/19 20:26
Start Date: 14/Jan/19 20:26
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #7494: [BEAM-3608] Port 
shaded Guava to vendored Guava
URL: https://github.com/apache/beam/pull/7494#issuecomment-454149304
 
 
   1. We could use checkstyle as mentioned on list. I think maybe 
`@SuppressWarnings` could work for exceptions. But actually if we enforce it in 
the Gradle dependency level then you can't even import non-vendored Guava 
anyhow. Great idea. I filed https://issues.apache.org/jira/browse/BEAM-6426 but 
left it unassigned since I am not jumping on the task this minute.
   
   2. I would like to make the shadow plugin inactive for almost all modules. 
Totally agree. And once we move vendoring further, to stop using it entirely.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184942)
Time Spent: 9h 40m  (was: 9.5h)

> Vendor Guava
> 
>
> Key: BEAM-3608
> URL: https://issues.apache.org/jira/browse/BEAM-3608
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Instead of shading as part of our build, we can shade before build so that it 
> is apparent when reading code, and in IDEs, that a particular class resides 
> in a hidden namespace.
> {{import com.google.common.reflect.TypeToken}}
> becomes something like
> {{import org.apache.beam.private.guava21.com.google.common.reflect.TypeToken}}
> So we can very trivially ban `org.apache.beam.private` from public APIs 
> unless they are annotated {{@Internal}}, and it makes sharing between our own 
> modules never get broken by shading again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6426) Enforce ban of non-vendored Guava, with exceptions

2019-01-14 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-6426:
-

 Summary: Enforce ban of non-vendored Guava, with exceptions
 Key: BEAM-6426
 URL: https://issues.apache.org/jira/browse/BEAM-6426
 Project: Beam
  Issue Type: Sub-task
  Components: build-system
Reporter: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2019-01-14 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed BEAM-6234.

   Resolution: Fixed
Fix Version/s: (was: 2.8.0)
   2.10.0

> [Flink Runner] Make failOnCheckpointingErrors setting available in 
> FlinkPipelineOptions
> ---
>
> Key: BEAM-6234
> URL: https://issues.apache.org/jira/browse/BEAM-6234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Harper
>Assignee: Daniel Harper
>Priority: Trivial
> Fix For: 2.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
> Flink to allow engineers to specify whether the job should fail in the event 
> of checkpoint failure. 
> This should be exposed in {{FlinkPipelineOptions}} and 
> {{FlinkExecutionEnvironments}} to allow users to configure this.
> The default for this value in Flink is true [2] 
> [1] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
> [2] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184941
 ]

ASF GitHub Bot logged work on BEAM-6234:


Author: ASF GitHub Bot
Created on: 14/Jan/19 20:13
Start Date: 14/Jan/19 20:13
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7284: [BEAM-6234] Make 
failOnCheckpointingErrors setting available in FlinkPipelineOptions
URL: https://github.com/apache/beam/pull/7284#issuecomment-454145416
 
 
   Added the test upon merge. Thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184941)
Time Spent: 50m  (was: 40m)

> [Flink Runner] Make failOnCheckpointingErrors setting available in 
> FlinkPipelineOptions
> ---
>
> Key: BEAM-6234
> URL: https://issues.apache.org/jira/browse/BEAM-6234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Harper
>Assignee: Daniel Harper
>Priority: Trivial
> Fix For: 2.8.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
> Flink to allow engineers to specify whether the job should fail in the event 
> of checkpoint failure. 
> This should be exposed in {{FlinkPipelineOptions}} and 
> {{FlinkExecutionEnvironments}} to allow users to configure this.
> The default for this value in Flink is true [2] 
> [1] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
> [2] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184940
 ]

ASF GitHub Bot logged work on BEAM-6234:


Author: ASF GitHub Bot
Created on: 14/Jan/19 20:13
Start Date: 14/Jan/19 20:13
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #7284: [BEAM-6234] 
Make failOnCheckpointingErrors setting available in FlinkPipelineOptions
URL: https://github.com/apache/beam/pull/7284
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184940)
Time Spent: 40m  (was: 0.5h)

> [Flink Runner] Make failOnCheckpointingErrors setting available in 
> FlinkPipelineOptions
> ---
>
> Key: BEAM-6234
> URL: https://issues.apache.org/jira/browse/BEAM-6234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Harper
>Assignee: Daniel Harper
>Priority: Trivial
> Fix For: 2.8.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
> Flink to allow engineers to specify whether the job should fail in the event 
> of checkpoint failure. 
> This should be exposed in {{FlinkPipelineOptions}} and 
> {{FlinkExecutionEnvironments}} to allow users to configure this.
> The default for this value in Flink is true [2] 
> [1] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
> [2] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4783) Add bundleSize parameter to control splitting of Spark sources (useful for Dynamic Allocation)

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4783?focusedWorklogId=184937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184937
 ]

ASF GitHub Bot logged work on BEAM-4783:


Author: ASF GitHub Bot
Created on: 14/Jan/19 19:44
Start Date: 14/Jan/19 19:44
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6884: [BEAM-4783] 
Fix invalid parameter to set the partitioner in Spark GbK
URL: https://github.com/apache/beam/pull/6884#issuecomment-454134845
 
 
   @iemejia Were you able to confirm if this change fixed the performance 
issues seen in the Spark Runner? My code was poorly written and believe that 
your way is much more clear but if you take a look at the state before my 
[PR](https://github.com/apache/beam/pull/6181) you will what I am talking 
about. Specifically [this 
comment](https://github.com/apache/beam/pull/6181#discussion_r215912675).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184937)
Time Spent: 6h 50m  (was: 6h 40m)

> Add bundleSize parameter to control splitting of Spark sources (useful for 
> Dynamic Allocation)
> --
>
> Key: BEAM-4783
> URL: https://issues.apache.org/jira/browse/BEAM-4783
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Affects Versions: 2.8.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
> Fix For: 2.8.0, 2.9.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When the spark-runner is used along with the configuration 
> spark.dynamicAllocation.enabled=true the SourceRDD does not detect this. It 
> then falls back to the value calculated in this description:
>   // when running on YARN/SparkDeploy it's the result of max(totalCores, 
> 2).
>   // when running on Mesos it's 8.
>   // when running local it's the total number of cores (local = 1, 
> local[N] = N,
>   // local[*] = estimation of the machine's cores).
>   // ** the configuration "spark.default.parallelism" takes precedence 
> over all of the above **
> So in most cases this default is quite small. This is an issue when using a 
> very large input file as it will only get split in half.
> I believe that when Dynamic Allocation is enable the SourceRDD should use the 
> DEFAULT_BUNDLE_SIZE and possibly expose a SparkPipelineOptions that allows 
> you to change this DEFAULT_BUNDLE_SIZE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4783) Add bundleSize parameter to control splitting of Spark sources (useful for Dynamic Allocation)

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4783?focusedWorklogId=184936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184936
 ]

ASF GitHub Bot logged work on BEAM-4783:


Author: ASF GitHub Bot
Created on: 14/Jan/19 19:40
Start Date: 14/Jan/19 19:40
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6884: [BEAM-4783] 
Fix invalid parameter to set the partitioner in Spark GbK
URL: https://github.com/apache/beam/pull/6884#issuecomment-454134845
 
 
   @iemejia Were you able to confirm if this change fixed the performance 
issues seen in the Spark Runner? My code was poorly written and believe that 
your way is much more clear but if you take a look at the state before my 
[PR](https://github.com/apache/beam/pull/6181) you will what I am talking 
about. Specifically [this comment by 
iemejia](https://github.com/apache/beam/pull/6181#discussion_r215912675).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184936)
Time Spent: 6h 40m  (was: 6.5h)

> Add bundleSize parameter to control splitting of Spark sources (useful for 
> Dynamic Allocation)
> --
>
> Key: BEAM-4783
> URL: https://issues.apache.org/jira/browse/BEAM-4783
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Affects Versions: 2.8.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
> Fix For: 2.8.0, 2.9.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> When the spark-runner is used along with the configuration 
> spark.dynamicAllocation.enabled=true the SourceRDD does not detect this. It 
> then falls back to the value calculated in this description:
>   // when running on YARN/SparkDeploy it's the result of max(totalCores, 
> 2).
>   // when running on Mesos it's 8.
>   // when running local it's the total number of cores (local = 1, 
> local[N] = N,
>   // local[*] = estimation of the machine's cores).
>   // ** the configuration "spark.default.parallelism" takes precedence 
> over all of the above **
> So in most cases this default is quite small. This is an issue when using a 
> very large input file as it will only get split in half.
> I believe that when Dynamic Allocation is enable the SourceRDD should use the 
> DEFAULT_BUNDLE_SIZE and possibly expose a SparkPipelineOptions that allows 
> you to change this DEFAULT_BUNDLE_SIZE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6418) beam_PreCommit_Java_Cron failing

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6418?focusedWorklogId=184915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184915
 ]

ASF GitHub Bot logged work on BEAM-6418:


Author: ASF GitHub Bot
Created on: 14/Jan/19 18:28
Start Date: 14/Jan/19 18:28
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #7496: [BEAM-6418] Avoid 
Jenkins memory problems for Flink build targets
URL: https://github.com/apache/beam/pull/7496#issuecomment-454109778
 
 
   Thank you @mxm. Is it possible run them one after another?
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184915)
Time Spent: 50m  (was: 40m)

> beam_PreCommit_Java_Cron failing
> 
>
> Key: BEAM-6418
> URL: https://issues.apache.org/jira/browse/BEAM-6418
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, test-failures
>Reporter: Ahmet Altay
>Assignee: Maximilian Michels
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Java_Cron/814/console]
>  
> *16:22:16* 1: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink-1.6:test'.*16:22:16* > Process 'Gradle Test Executor 
> 108' finished with non-zero exit value 137*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16* *16:22:16* * Try:*16:22:16* Run with --stacktrace option to get 
> the stack trace. Run with --info or --debug option to get more log output. 
> Run with --scan to get full insights.*16:22:16* 
> ==*16:22:16*
>  *16:22:16* 2: Task failed with an exception.*16:22:16* ---*16:22:16* 
> * What went wrong:*16:22:16* Execution failed for task 
> ':beam-runners-flink_2.11:test'.*16:22:16* > Process 'Gradle Test Executor 
> 110' finished with non-zero exit value 1*16:22:16*   This problem might be 
> caused by incorrect test process configuration.*16:22:16*   Please refer to 
> the test execution section in the user guide at 
> [https://docs.gradle.org/4.10.3/userguide/java_plugin.html#sec:test_execution]
> *16:22:16*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6425) Replace SSLContext.getInstance("SSL")

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6425?focusedWorklogId=184909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184909
 ]

ASF GitHub Bot logged work on BEAM-6425:


Author: ASF GitHub Bot
Created on: 14/Jan/19 17:52
Start Date: 14/Jan/19 17:52
Worklog Time Spent: 10m 
  Work Description: coheigea commented on pull request #7499: BEAM-6425 - 
Replace SSLContext.getInstance("SSL")
URL: https://github.com/apache/beam/pull/7499
 
 
   Switch to SSLContext.getInstance("TLS") instead of the older SSL algorithm.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184909)
Time Spent: 10m
Remaining Estimate: 0h

> Replace SSLContext.getInstance("SSL")
> -
>
> Key: BEAM-6425
> URL: https://issues.apache.org/jira/browse/BEAM-6425
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-mongodb
>Reporter: Colm O hEigeartaigh
>Assignee: Colm O hEigeartaigh
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SSLUtils has an instance of: SSLContext.getInstance("SSL")
> Instead we should use "TLS" here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184911
 ]

ASF GitHub Bot logged work on BEAM-6234:


Author: ASF GitHub Bot
Created on: 14/Jan/19 17:58
Start Date: 14/Jan/19 17:58
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7284: [BEAM-6234] Make 
failOnCheckpointingErrors setting available in FlinkPipelineOptions
URL: https://github.com/apache/beam/pull/7284#issuecomment-454100397
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184911)
Time Spent: 0.5h  (was: 20m)

> [Flink Runner] Make failOnCheckpointingErrors setting available in 
> FlinkPipelineOptions
> ---
>
> Key: BEAM-6234
> URL: https://issues.apache.org/jira/browse/BEAM-6234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Harper
>Assignee: Daniel Harper
>Priority: Trivial
> Fix For: 2.8.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
> Flink to allow engineers to specify whether the job should fail in the event 
> of checkpoint failure. 
> This should be exposed in {{FlinkPipelineOptions}} and 
> {{FlinkExecutionEnvironments}} to allow users to configure this.
> The default for this value in Flink is true [2] 
> [1] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
> [2] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6234) [Flink Runner] Make failOnCheckpointingErrors setting available in FlinkPipelineOptions

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6234?focusedWorklogId=184910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184910
 ]

ASF GitHub Bot logged work on BEAM-6234:


Author: ASF GitHub Bot
Created on: 14/Jan/19 17:58
Start Date: 14/Jan/19 17:58
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #7284: [BEAM-6234] Make 
failOnCheckpointingErrors setting available in FlinkPipelineOptions
URL: https://github.com/apache/beam/pull/7284#issuecomment-454100353
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184910)
Time Spent: 20m  (was: 10m)

> [Flink Runner] Make failOnCheckpointingErrors setting available in 
> FlinkPipelineOptions
> ---
>
> Key: BEAM-6234
> URL: https://issues.apache.org/jira/browse/BEAM-6234
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Harper
>Assignee: Daniel Harper
>Priority: Trivial
> Fix For: 2.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The configuration setting {{failOnCheckpointingErrors}} [1] is available in 
> Flink to allow engineers to specify whether the job should fail in the event 
> of checkpoint failure. 
> This should be exposed in {{FlinkPipelineOptions}} and 
> {{FlinkExecutionEnvironments}} to allow users to configure this.
> The default for this value in Flink is true [2] 
> [1] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L249
> [2] 
> https://github.com/apache/flink/blob/release-1.5.2/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L73



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6354) Hanging BoundedReadFromUnboundedSourceTest#testTimeBound and SplittableDoFnTest#testLateData

2019-01-14 Thread Gleb Kanterov (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742339#comment-16742339
 ] 

Gleb Kanterov commented on BEAM-6354:
-

[~kenn] yes, technically, I discovered first, and then ignored them

> Hanging BoundedReadFromUnboundedSourceTest#testTimeBound and 
> SplittableDoFnTest#testLateData
> 
>
> Key: BEAM-6354
> URL: https://issues.apache.org/jira/browse/BEAM-6354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Priority: Major
> Fix For: 2.10.0
>
>
> It seems that they have a similar root cause because both of them use 
> unbounded streams.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6321) flinkCompatibilityMatrixBatchDOCKER out of memory

2019-01-14 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved BEAM-6321.
--
   Resolution: Fixed
Fix Version/s: Not applicable

Should be fixed by https://github.com/apache/beam/pull/7483

CC [~robertwb]

> flinkCompatibilityMatrixBatchDOCKER out of memory
> -
>
> Key: BEAM-6321
> URL: https://issues.apache.org/jira/browse/BEAM-6321
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Andrew Pilloud
>Priority: Major
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/1156/
> https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/1158/
> {code}
> 02:48:35 #
> 02:48:35 # There is insufficient memory for the Java Runtime Environment to 
> continue.
> 02:48:35 # Native memory allocation (mmap) failed to map 5019009024 bytes for 
> committing reserved memory.
> 02:48:35 # An error report file with more information is saved as:
> 02:48:35 # 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/sdks/python/hs_err_pid21225.log
> 02:48:37 Exception in thread wait_until_finish_read:
> 02:48:37 Traceback (most recent call last):
> 02:48:37   File "/usr/lib/python2.7/threading.py", line 801, in 
> __bootstrap_inner
> 02:48:37 self.run()
> 02:48:37   File "/usr/lib/python2.7/threading.py", line 754, in run
> 02:48:37 self.__target(*self.__args, **self.__kwargs)
> 02:48:37   File "apache_beam/runners/portability/portable_runner.py", line 
> 285, in read_messages
> 02:48:37 beam_job_api_pb2.JobMessagesRequest(job_id=self._job_id)):
> 02:48:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py",
>  line 366, in next
> 02:48:37 return self._next()
> 02:48:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/build/gradleenv/1327086738/local/lib/python2.7/site-packages/grpc/_channel.py",
>  line 357, in _next
> 02:48:37 raise self
> 02:48:37 _Rendezvous: <_Rendezvous of RPC that terminated with:
> 02:48:37  status = StatusCode.UNAVAILABLE
> 02:48:37  details = "Socket closed"
> 02:48:37  debug_error_string = 
> "{"created":"@1545821317.301500173","description":"Error received from 
> peer","file":"src/core/lib/surface/call.cc","file_line":1036,"grpc_message":"Socket
>  closed","grpc_status":14}"
> 02:48:37 >
> 02:48:37 E
> 02:48:37 INFO:root:Using latest locally built Python SDK docker image.
> 04:18:29 Build timed out (after 100 minutes). Marking the build as aborted.
> 04:18:30 Build was aborted
> 04:18:30 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6334) Change status of LoadTests after failure on Dataflow

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6334?focusedWorklogId=184902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184902
 ]

ASF GitHub Bot logged work on BEAM-6334:


Author: ASF GitHub Bot
Created on: 14/Jan/19 17:19
Start Date: 14/Jan/19 17:19
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #7450: 
[BEAM-6334] Add throwing exception in case of invalid state or timeout
URL: https://github.com/apache/beam/pull/7450#discussion_r247574008
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/LoadTest.java
 ##
 @@ -85,17 +88,22 @@ public PipelineResult run() throws IOException {
 
 loadTest();
 
-PipelineResult result = pipeline.run();
-result.waitUntilFinish();
-
-LoadTestResult testResult = LoadTestResult.create(result, 
metricsNamespace, testStartTime);
+PipelineResult pipelineResult = pipeline.run();
+
pipelineResult.waitUntilFinish(Duration.standardMinutes(options.getLoadTestTimeout()));
 
+LoadTestResult testResult = LoadTestResult.create(pipelineResult, 
metricsNamespace, testStartTime);
 ConsoleResultPublisher.publish(testResult);
 
-if (options.getPublishToBigQuery()) {
-  publishResultToBigQuery(testResult);
+Optional failure = assertFailure(pipelineResult, testResult);
+
+if(failure.isPresent()) {
 
 Review comment:
   nit: missing space between "if" and "("
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184902)
Time Spent: 1.5h  (was: 1h 20m)

> Change status of LoadTests after failure on Dataflow
> 
>
> Key: BEAM-6334
> URL: https://issues.apache.org/jira/browse/BEAM-6334
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When GroupByKey load test was tested via Jenkins and it was failing on 
> Dataflow, Jenkins was showing Success status.
> Example:
> [Jenkins build 
> link|https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/3/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6334) Change status of LoadTests after failure on Dataflow

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6334?focusedWorklogId=184901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184901
 ]

ASF GitHub Bot logged work on BEAM-6334:


Author: ASF GitHub Bot
Created on: 14/Jan/19 17:19
Start Date: 14/Jan/19 17:19
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #7450: 
[BEAM-6334] Add throwing exception in case of invalid state or timeout
URL: https://github.com/apache/beam/pull/7450#discussion_r247579532
 
 

 ##
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/JobFailure.java
 ##
 @@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.loadtests;
+
+import static java.lang.String.format;
+import static java.util.Optional.empty;
+import static java.util.Optional.of;
+
+import java.io.IOException;
+import java.util.Optional;
+import org.apache.beam.sdk.PipelineResult;
+import org.joda.time.Duration;
+
+/** Class for detecting failures after {@link 
PipelineResult#waitUntilFinish(Duration)} unblocks. */
+class JobFailure {
+
+  private String cause;
+
+  private boolean requiresCancelling;
+
+  JobFailure(String cause, boolean requiresCancelling) {
+this.cause = cause;
+this.requiresCancelling = requiresCancelling;
+  }
+
+  static Optional assertFailure(PipelineResult pipelineResult,
+  LoadTestResult testResult) {
+PipelineResult.State state = pipelineResult.getState();
+
+Optional stateRelatedFailure = lookForInvalidState(state);
+
+if (stateRelatedFailure.isPresent()) {
+  return stateRelatedFailure;
+} else {
+  return lookForMetricResultFailure(testResult);
+}
+  }
+
+  private static Optional 
lookForMetricResultFailure(LoadTestResult testResult) {
+if (testResult.getRuntime() == -1 || testResult.getTotalBytesCount() == 
-1) {
+  return of(new JobFailure("Invalid test results", false));
+} else {
+  return empty();
+}
+  }
+
+  static Optional lookForInvalidState(PipelineResult.State state) {
+switch (state) {
+  case RUNNING:
+  case UNKNOWN:
+return of(new JobFailure("Job timeout.", true));
+
+  case CANCELLED:
+  case FAILED:
+  case STOPPED:
+  case UPDATED:
+return of(new JobFailure(format("Invalid job state: %s.", 
state.toString()), false));
+
+  default:
+return empty();
+}
+  }
+
+  static PipelineResult handleFailure(PipelineResult pipelineResult, 
JobFailure failure)
+  throws IOException {
+
+if(failure.requiresCancelling) {
 
 Review comment:
   nit: missing space between "if" and "("
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184901)

> Change status of LoadTests after failure on Dataflow
> 
>
> Key: BEAM-6334
> URL: https://issues.apache.org/jira/browse/BEAM-6334
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When GroupByKey load test was tested via Jenkins and it was failing on 
> Dataflow, Jenkins was showing Success status.
> Example:
> [Jenkins build 
> link|https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/3/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call

2019-01-14 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742296#comment-16742296
 ] 

Jean-Baptiste Onofré commented on BEAM-6424:


Thanks. I will fix that as discussed on github.

> RabbitMqIO: NullPointerException raised on getWatermark() first call
> 
>
> Key: BEAM-6424
> URL: https://issues.apache.org/jira/browse/BEAM-6424
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Augustin Lafanechere
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> I tried to use the RabbitMqIO with the direct runner to generate an unbounded 
> PCollection from a queue. I encounter a NPE :
> {quote}{{java.lang.NullPointerException}}
> {{ at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement
>  (UnboundedReadEvaluatorFactory.java:169)}}
> {{}}
> {quote}
> After investigation it looks like it's caused by the fact that no default is 
> given to 
> checkpointMark.oldestTimestamp. getWatermark() is called before the mutation 
> of the currentTimestamp variable, raising a NPE. I fixed the problem on my 
> side, reimplementing the class and overriding getWatermark to return 
> Instant.now()  if checkpointMark.oldestTimestamp is null :
> {quote}@Override
> publicInstantgetWatermark() {
>   if (checkpointMark.oldestTimestamp == null) {
> returnInstant.now();
>   }
>   return checkpointMark.oldestTimestamp;
> }{quote}
> It looks likes this bug as [already been raised 
> here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869]
>  on the PR for RabbitMqIO.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call

2019-01-14 Thread Augustin Lafanechere (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Augustin Lafanechere reassigned BEAM-6424:
--

Assignee: Jean-Baptiste Onofré

> RabbitMqIO: NullPointerException raised on getWatermark() first call
> 
>
> Key: BEAM-6424
> URL: https://issues.apache.org/jira/browse/BEAM-6424
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Augustin Lafanechere
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> I tried to use the RabbitMqIO with the direct runner to generate an unbounded 
> PCollection from a queue. I encounter a NPE :
> {quote}{{java.lang.NullPointerException}}
> {{ at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement
>  (UnboundedReadEvaluatorFactory.java:169)}}
> {{}}
> {quote}
> After investigation it looks like it's caused by the fact that no default is 
> given to 
> checkpointMark.oldestTimestamp. getWatermark() is called before the mutation 
> of the currentTimestamp variable, raising a NPE. I fixed the problem on my 
> side, reimplementing the class and overriding getWatermark to return 
> Instant.now()  if checkpointMark.oldestTimestamp is null :
> {quote}@Override
> publicInstantgetWatermark() {
>   if (checkpointMark.oldestTimestamp == null) {
> returnInstant.now();
>   }
>   return checkpointMark.oldestTimestamp;
> }{quote}
> It looks likes this bug as [already been raised 
> here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869]
>  on the PR for RabbitMqIO.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call

2019-01-14 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-6424:
--

Assignee: (was: Eugene Kirpichov)

> RabbitMqIO: NullPointerException raised on getWatermark() first call
> 
>
> Key: BEAM-6424
> URL: https://issues.apache.org/jira/browse/BEAM-6424
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Augustin Lafanechere
>Priority: Minor
>
> I tried to use the RabbitMqIO with the direct runner to generate an unbounded 
> PCollection from a queue. I encounter a NPE :
> {quote}{{java.lang.NullPointerException}}
> {{ at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement
>  (UnboundedReadEvaluatorFactory.java:169)}}
> {{}}
> {quote}
> After investigation it looks like it's caused by the fact that no default is 
> given to 
> checkpointMark.oldestTimestamp. getWatermark() is called before the mutation 
> of the currentTimestamp variable, raising a NPE. I fixed the problem on my 
> side, reimplementing the class and overriding getWatermark to return 
> Instant.now()  if checkpointMark.oldestTimestamp is null :
> {quote}@Override
> publicInstantgetWatermark() {
>   if (checkpointMark.oldestTimestamp == null) {
> returnInstant.now();
>   }
>   return checkpointMark.oldestTimestamp;
> }{quote}
> It looks likes this bug as [already been raised 
> here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869]
>  on the PR for RabbitMqIO.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call

2019-01-14 Thread Augustin Lafanechere (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742294#comment-16742294
 ] 

Augustin Lafanechere commented on BEAM-6424:


I assigned to [~jbonofre]. He offered his help after [my 
comment|https://github.com/apache/beam/pull/1729#issuecomment-454046116] on his 
original PR.

> RabbitMqIO: NullPointerException raised on getWatermark() first call
> 
>
> Key: BEAM-6424
> URL: https://issues.apache.org/jira/browse/BEAM-6424
> Project: Beam
>  Issue Type: Bug
>  Components: io-ideas
>Reporter: Augustin Lafanechere
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
>
> I tried to use the RabbitMqIO with the direct runner to generate an unbounded 
> PCollection from a queue. I encounter a NPE :
> {quote}{{java.lang.NullPointerException}}
> {{ at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement
>  (UnboundedReadEvaluatorFactory.java:169)}}
> {{}}
> {quote}
> After investigation it looks like it's caused by the fact that no default is 
> given to 
> checkpointMark.oldestTimestamp. getWatermark() is called before the mutation 
> of the currentTimestamp variable, raising a NPE. I fixed the problem on my 
> side, reimplementing the class and overriding getWatermark to return 
> Instant.now()  if checkpointMark.oldestTimestamp is null :
> {quote}@Override
> publicInstantgetWatermark() {
>   if (checkpointMark.oldestTimestamp == null) {
> returnInstant.now();
>   }
>   return checkpointMark.oldestTimestamp;
> }{quote}
> It looks likes this bug as [already been raised 
> here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869]
>  on the PR for RabbitMqIO.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6425) Replace SSLContext.getInstance("SSL")

2019-01-14 Thread Colm O hEigeartaigh (JIRA)
Colm O hEigeartaigh created BEAM-6425:
-

 Summary: Replace SSLContext.getInstance("SSL")
 Key: BEAM-6425
 URL: https://issues.apache.org/jira/browse/BEAM-6425
 Project: Beam
  Issue Type: Improvement
  Components: io-java-mongodb
Reporter: Colm O hEigeartaigh
Assignee: Colm O hEigeartaigh


SSLUtils has an instance of: SSLContext.getInstance("SSL")

Instead we should use "TLS" here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6351) OutOfMemoryError on DirectRunner while running load tests

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6351?focusedWorklogId=184888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184888
 ]

ASF GitHub Bot logged work on BEAM-6351:


Author: ASF GitHub Bot
Created on: 14/Jan/19 16:35
Start Date: 14/Jan/19 16:35
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #7497: [BEAM-6351] 
Divide separate job for "smoke" load test suites
URL: https://github.com/apache/beam/pull/7497#discussion_r247564649
 
 

 ##
 File path: .test-infra/jenkins/job_LoadTests_Java.groovy
 ##
 @@ -69,6 +53,53 @@ for (testConfiguration in testsConfigurations) {
 ) {
 description(testConfiguration.jobDescription)
 commonJobProperties.setTopLevelMainJobProperties(delegate)
-loadTestsBuilder.buildTest(delegate, testConfiguration.jobDescription, 
testConfiguration.runner, testConfiguration.jobProperties, 
testConfiguration.itClass)
+loadTestsBuilder.loadTest(delegate, testConfiguration.jobDescription, 
testConfiguration.runner, testConfiguration.jobProperties, 
testConfiguration.itClass)
+}
+}
+
+def smokeTestConfigurations = [
+[
+title: 'GroupByKey load test Direct',
+itClass  : 
'org.apache.beam.sdk.loadtests.GroupByKeyLoadTest',
+runner   : CommonTestProperties.Runner.DIRECT,
+jobProperties: [
+publishToBigQuery: true,
+bigQueryDataset  : 'load_test_SMOKE',
+bigQueryTable: 'direct_gbk',
+sourceOptions: 
'{"numRecords":10,"splitPointFrequencyRecords":1}',
+stepOptions  : 
'{"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true}',
+fanout   : 10,
+iterations   : 1,
+]
+],
+[
+title: 'GroupByKey load test Dataflow',
 
 Review comment:
   I think adding smoke tests for other runners is probably useful. Most have 
local runners so the test should be fairly quick. Dataflow doesn't have this 
option and takes a few minutes to start up, so tests less than a few minutes 
will be mostly startup time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184888)
Time Spent: 10m
Remaining Estimate: 0h

> OutOfMemoryError on DirectRunner while running load tests 
> --
>
> Key: BEAM-6351
> URL: https://issues.apache.org/jira/browse/BEAM-6351
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The GroupByKey Java load test with 10 number of records is failing on 
> DirectRunner with OutOfMemory Error. Then  the test is aborted after timeout.
> This is [example failing run of the 
> job|https://builds.apache.org/job/beam_Java_LoadTests_GroupByKey_Direct_Small_PR/2/].
> The stacktrace is following:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 18:02:15  at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:204)
> 18:02:15  at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
> 18:02:15  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
> 18:02:15  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
> 18:02:15  at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:75)
> 18:02:15  at 
> org.apache.beam.sdk.loadtests.GroupByKeyLoadTest.run(GroupByKeyLoadTest.java:58)
> 18:02:15  at 
> org.apache.beam.sdk.loadtests.GroupByKeyLoadTest.main(GroupByKeyLoadTest.java:130)
> 18:02:15 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> 18:02:15  at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:97)
> 18:02:15  at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
> 18:02:15  at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:117)
> 18:02:15  at 
> org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44)
> 18:02:15  at 
> 

[jira] [Resolved] (BEAM-6397) Add KeyValue attribute to SyntheticDataPubSubPublisher

2019-01-14 Thread Kasia Kucharczyk (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kasia Kucharczyk resolved BEAM-6397.

   Resolution: Fixed
Fix Version/s: 2.10.0

> Add KeyValue attribute to SyntheticDataPubSubPublisher
> --
>
> Key: BEAM-6397
> URL: https://issues.apache.org/jira/browse/BEAM-6397
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Kasia Kucharczyk
>Priority: Minor
> Fix For: 2.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently SyntheticDataPubSubPublisher sends only synthetic elements 
> converted to ByteArray  to PubSub. To perform KeyValue tranforms (such us 
> GroupByKey) on PubSub source I propose to send generated synthetic data as 
> Map via attributes parameter in PubsubMessage object.
> Also it is required to fix NoClassDefFoundError (details in comment).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6405) Improve PortableValidatesRunner test reliability on Jenkins

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6405?focusedWorklogId=184878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184878
 ]

ASF GitHub Bot logged work on BEAM-6405:


Author: ASF GitHub Bot
Created on: 14/Jan/19 16:03
Start Date: 14/Jan/19 16:03
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #7461: [BEAM-6405] Let 
PortableValidatesRunner tests run in EMBEDDED environment
URL: https://github.com/apache/beam/pull/7461
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184878)
Time Spent: 5h  (was: 4h 50m)

> Improve PortableValidatesRunner test reliability on Jenkins
> ---
>
> Key: BEAM-6405
> URL: https://issues.apache.org/jira/browse/BEAM-6405
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The PVR tests seem to be passing fine and then failing consecutively for no 
> reason: https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/ 
> It looks like the outrageous parallelism, i.e. number of available cores, is 
> responsible for the flakiness if there is additional load on the build 
> slaves. We should lower the parallelism.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1240) Create RabbitMqIO

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=184846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184846
 ]

ASF GitHub Bot logged work on BEAM-1240:


Author: ASF GitHub Bot
Created on: 14/Jan/19 15:44
Start Date: 14/Jan/19 15:44
Worklog Time Spent: 10m 
  Work Description: alafanechere commented on issue #1729: [BEAM-1240] 
Create RabbitMqIO
URL: https://github.com/apache/beam/pull/1729#issuecomment-454050841
 
 
   I juste created the JIRA : https://jira.apache.org/jira/browse/BEAM-6424
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184846)
Time Spent: 14.5h  (was: 14h 20m)

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1240) Create RabbitMqIO

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=184847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184847
 ]

ASF GitHub Bot logged work on BEAM-1240:


Author: ASF GitHub Bot
Created on: 14/Jan/19 15:44
Start Date: 14/Jan/19 15:44
Worklog Time Spent: 10m 
  Work Description: alafanechere commented on issue #1729: [BEAM-1240] 
Create RabbitMqIO
URL: https://github.com/apache/beam/pull/1729#issuecomment-454050841
 
 
   Thanks ! I juste created the JIRA : 
https://jira.apache.org/jira/browse/BEAM-6424
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184847)
Time Spent: 14h 40m  (was: 14.5h)

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-01-14 Thread Marco Veluscek (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742214#comment-16742214
 ] 

Marco Veluscek commented on BEAM-3772:
--

Ok, thank you, I'll try again with 2.9 to confirm this.

Thank you for your help.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Major
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6408) beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6408?focusedWorklogId=184845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184845
 ]

ASF GitHub Bot logged work on BEAM-6408:


Author: ASF GitHub Bot
Created on: 14/Jan/19 15:34
Start Date: 14/Jan/19 15:34
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #7485: [BEAM-6408] Fix load 
test job
URL: https://github.com/apache/beam/pull/7485#issuecomment-454047129
 
 
   Run GroupByKey Small Java Load Test Dataflow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184845)
Time Spent: 10m
Remaining Estimate: 0h

> beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts
> --
>
> Key: BEAM-6408
> URL: https://issues.apache.org/jira/browse/BEAM-6408
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This job starts a load test that lasts 4 hours on Dataflow (on 10 workers). 
> It fails due to jenkins timeout. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-1240) Create RabbitMqIO

2019-01-14 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1240?focusedWorklogId=184844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-184844
 ]

ASF GitHub Bot logged work on BEAM-1240:


Author: ASF GitHub Bot
Created on: 14/Jan/19 15:34
Start Date: 14/Jan/19 15:34
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #1729: [BEAM-1240] Create 
RabbitMqIO
URL: https://github.com/apache/beam/pull/1729#issuecomment-454047121
 
 
   @alafanechere by the way, I can create the Jira for you (and I will assign 
the Jira to me). Please let me know.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 184844)
Time Spent: 14h 20m  (was: 14h 10m)

> Create RabbitMqIO
> -
>
> Key: BEAM-1240
> URL: https://issues.apache.org/jira/browse/BEAM-1240
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6424) RabbitMqIO: NullPointerException raised on getWatermark() first call

2019-01-14 Thread Augustin Lafanechere (JIRA)
Augustin Lafanechere created BEAM-6424:
--

 Summary: RabbitMqIO: NullPointerException raised on getWatermark() 
first call
 Key: BEAM-6424
 URL: https://issues.apache.org/jira/browse/BEAM-6424
 Project: Beam
  Issue Type: Bug
  Components: io-ideas
Reporter: Augustin Lafanechere
Assignee: Eugene Kirpichov


I tried to use the RabbitMqIO with the direct runner to generate an unbounded 
PCollection from a queue. I encounter a NPE :
{quote}{{java.lang.NullPointerException}}
{{ at 
org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement
 (UnboundedReadEvaluatorFactory.java:169)}}
{{}}
{quote}
After investigation it looks like it's caused by the fact that no default is 
given to 
checkpointMark.oldestTimestamp. getWatermark() is called before the mutation of 
the currentTimestamp variable, raising a NPE. I fixed the problem on my side, 
reimplementing the class and overriding getWatermark to return Instant.now()  
if checkpointMark.oldestTimestamp is null :
{quote}@Override
publicInstantgetWatermark() {
  if (checkpointMark.oldestTimestamp == null) {
returnInstant.now();
  }
  return checkpointMark.oldestTimestamp;
}{quote}
It looks likes this bug as [already been raised 
here|https://jira.apache.org/jira/browse/BEAM-1240?focusedCommentId=16566869=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16566869]
 on the PR for RabbitMqIO.
 
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >