[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256376
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 08/Jun/19 05:34
Start Date: 08/Jun/19 05:34
Worklog Time Spent: 10m 
  Work Description: XuMingmin commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r291795277
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   I do have my own `KafkaTableProvider` to handle customized auth settings. 
Instead of delete the methods in `KafkaTableProvider`, I would prefer to 
support it in `KafkaTableProvider`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256376)
Time Spent: 1h 10m  (was: 1h)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at 

[jira] [Work logged] (BEAM-7470) Clean up Data Plane, rely only on instruction id and transform id

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7470?focusedWorklogId=256306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256306
 ]

ASF GitHub Bot logged work on BEAM-7470:


Author: ASF GitHub Bot
Created on: 08/Jun/19 01:01
Start Date: 08/Jun/19 01:01
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8733: [BEAM-7470] 
Update proto and all SDKs to make the logical data stream over the data plane 
identified solely by instruction id and transform id.
URL: https://github.com/apache/beam/pull/8733
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256306)
Time Spent: 1h 10m  (was: 1h)

> Clean up Data Plane, rely only on instruction id and transform id
> -
>
> Key: BEAM-7470
> URL: https://issues.apache.org/jira/browse/BEAM-7470
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The intent is to remove the BeamFnApi.Target type since it is not used widely 
> since it is intended that the remote reader/writer are identified uniquely by 
> instruction id and transform id.
> We also remove the channel splits input id since it is unnecessary with this 
> change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7470) Clean up Data Plane, rely only on instruction id and transform id

2019-06-07 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-7470.
-
   Resolution: Fixed
Fix Version/s: 2.14.0

> Clean up Data Plane, rely only on instruction id and transform id
> -
>
> Key: BEAM-7470
> URL: https://issues.apache.org/jira/browse/BEAM-7470
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The intent is to remove the BeamFnApi.Target type since it is not used widely 
> since it is intended that the remote reader/writer are identified uniquely by 
> instruction id and transform id.
> We also remove the channel splits input id since it is unnecessary with this 
> change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7516) Add a watermark manager for the fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-7516:
---

 Summary: Add a watermark manager for the fn_api_runner
 Key: BEAM-7516
 URL: https://issues.apache.org/jira/browse/BEAM-7516
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada


To track watermarks for each stage



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7515) Support TestStream on the fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859055#comment-16859055
 ] 

Pablo Estrada commented on BEAM-7515:
-

This involves a runner-side implementation of the teststream operation.

> Support TestStream on the fn_api_runner
> ---
>
> Key: BEAM-7515
> URL: https://issues.apache.org/jira/browse/BEAM-7515
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7515) Support TestStream on the fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-7515:
---

 Summary: Support TestStream on the fn_api_runner
 Key: BEAM-7515
 URL: https://issues.apache.org/jira/browse/BEAM-7515
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7515) Support TestStream on the fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-7515:

Status: Open  (was: Triage Needed)

> Support TestStream on the fn_api_runner
> ---
>
> Key: BEAM-7515
> URL: https://issues.apache.org/jira/browse/BEAM-7515
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (BEAM-7514) Support streaming on the Python fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7514 started by Pablo Estrada.
---
> Support streaming on the Python fn_api_runner
> -
>
> Key: BEAM-7514
> URL: https://issues.apache.org/jira/browse/BEAM-7514
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7514) Support streaming on the Python fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-7514:
---

 Summary: Support streaming on the Python fn_api_runner
 Key: BEAM-7514
 URL: https://issues.apache.org/jira/browse/BEAM-7514
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7514) Support streaming on the Python fn_api_runner

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-7514:

Status: Open  (was: Triage Needed)

> Support streaming on the Python fn_api_runner
> -
>
> Key: BEAM-7514
> URL: https://issues.apache.org/jira/browse/BEAM-7514
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2019-06-07 Thread Alex Amato (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859054#comment-16859054
 ] 

Alex Amato commented on BEAM-4374:
--

For Python yes, but not for Go

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6094) Implement External environment for Portable Beam

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-6094.
-
   Resolution: Fixed
Fix Version/s: Not applicable

This is fixed by 7307

> Implement External environment for Portable Beam
> 
>
> Key: BEAM-6094
> URL: https://issues.apache.org/jira/browse/BEAM-6094
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4374) Update existing metrics in the FN API to use new Metric Schema

2019-06-07 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859051#comment-16859051
 ] 

Pablo Estrada commented on BEAM-4374:
-

Is this resolved?

> Update existing metrics in the FN API to use new Metric Schema
> --
>
> Key: BEAM-4374
> URL: https://issues.apache.org/jira/browse/BEAM-4374
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Alex Amato
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Update existing metrics to use the new proto and cataloging schema defined in:
> [_https://s.apache.org/beam-fn-api-metrics_]
>  * Check in new protos
>  * Define catalog file for metrics
>  * Port existing metrics to use this new format, based on catalog 
> names+metadata



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6280) Failure in PortableRunnerTest.test_error_traceback_includes_user_code

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-6280.
-
   Resolution: Fixed
Fix Version/s: Not applicable

Thanks!

> Failure in PortableRunnerTest.test_error_traceback_includes_user_code
> -
>
> Key: BEAM-6280
> URL: https://issues.apache.org/jira/browse/BEAM-6280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Assignee: Sam Rohde
>Priority: Critical
>  Labels: flaky-test
> Fix For: Not applicable
>
> Attachments: data.txt
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/]
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/732/testReport/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_error_traceback_includes_user_code/]
> [https://scans.gradle.com/s/do3hjulee3gaa/console-log?task=:beam-sdks-python:testPython3]
> {code:java}
> 'second' not found in 'Traceback (most recent call last):\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py",
>  line 466, in test_error_traceback_includes_user_code\np | 
> beam.Create([0]) | beam.Map(first)  # pylint: 
> disable=expression-not-assigned\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/pipeline.py",
>  line 425, in __exit__\nself.run().wait_until_finish()\n  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/apache_beam/runners/portability/portable_runner.py",
>  line 314, in wait_until_finish\nself._job_id, self._state, 
> self._last_error_message()))\nRuntimeError: Pipeline 
> job-cdcefe6d-1caa-4487-9e63-e971f67ec68c failed in state FAILED: start 
>  coder=WindowedValueCoder[BytesCoder], len(consumers)=1]]>\n'{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3886) Python SDK harness uses State API from ProcessBundleDescriptor

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-3886.
-
   Resolution: Fixed
Fix Version/s: Not applicable

This was fixed a while back.

> Python SDK harness uses State API from ProcessBundleDescriptor
> --
>
> Key: BEAM-3886
> URL: https://issues.apache.org/jira/browse/BEAM-3886
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Ben Sidhom
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Python harness should pull the state api descriptor from the current 
> process bundle descriptor when processing bundles.
> As a minor optimization and to make implementing new runners easier, the 
> harness should not talk to the State server unless it's actually needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6186) Cleanup FnApiRunner optimization phases.

2019-06-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-6186.
-
   Resolution: Fixed
Fix Version/s: 2.10.0

Thanks Robert. Marking as fixed.

> Cleanup FnApiRunner optimization phases.
> 
>
> Key: BEAM-6186
> URL: https://issues.apache.org/jira/browse/BEAM-6186
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Minor
> Fix For: 2.10.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> They are currently expressed as functions with closure. It would be good to 
> pull them out with explicit dependencies both to better be able to follow the 
> code, and also be able to test and reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7513) Row Estimation for BigQueryTable

2019-06-07 Thread Alireza Samadianzakaria (JIRA)
Alireza Samadianzakaria created BEAM-7513:
-

 Summary: Row Estimation for BigQueryTable
 Key: BEAM-7513
 URL: https://issues.apache.org/jira/browse/BEAM-7513
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql, io-java-gcp
Reporter: Alireza Samadianzakaria
Assignee: Alireza Samadianzakaria


Calcite tables (org.apache.calcite.schema.Table) should implement the method 
org.apache.calcite.schema.Statistic getStatistic(). The Statistic instance 
returned by this method is used for the Volcano optimizer in Calcite. 

Currently, org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable has not 
implemented getStatistic() which means it uses the implementation in 
org.apache.calcite.schema.impl.AbstractTable and that implementation just 
returns Statistics.UNKNOWN for all sources.

 

Things needed to be implemented:

1- Implementing getStatistic in BeamCalciteTable such that it calls a row count 
estimation method from BeamSqlTable and adding this method to BeamSqlTable.

2- Implementing the row count estimation method for BigQueryTable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256292
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 07/Jun/19 23:15
Start Date: 07/Jun/19 23:15
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r291779115
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   I think it is quite possible that there exists external (possibly not 
opensource) code that extends `KafkaTableProvider` and overloads 
`buildBeamSqlTable()` for the purposes of adding features not in Beam. I've 
written things like this in the past.
   
   Even if that is the case I don't think supporting manipulating a single 
field is a reasonable long term approach. We should still delete this. (I think 
it would be reasonable to move the construction of the KafkaIO reader and 
writer into seperate functions that can be overloaded, but that doesn't need to 
be part of this change.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256292)
Time Spent: 1h  (was: 50m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at 

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256291
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 07/Jun/19 23:14
Start Date: 07/Jun/19 23:14
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r291779115
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   I think it is quite possible that there exists external (possibly not 
opensource) code that extends `KafkaTableProvider` and overloads 
`buildBeamSqlTable()` for the purposes of adding features not in Beam. I've 
written things like this in the past.
   
   Even if that is the case I don't think supporting manipulating a single 
field is a reasonable approach. We should still delete this. (I think it would 
be reasonable to move the construction of the KafkaIO reader and writer into 
seperate functions that can be overloaded, but that doesn't need to be part of 
this change.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256291)
Time Spent: 50m  (was: 40m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at 

[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=256285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256285
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 07/Jun/19 23:06
Start Date: 07/Jun/19 23:06
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #8783: [BEAM-6777] Remove the 
unnecessary enable_health_checker flag
URL: https://github.com/apache/beam/pull/8783#issuecomment-500065802
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256285)
Time Spent: 4h 20m  (was: 4h 10m)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256283
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:59
Start Date: 07/Jun/19 22:59
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r291777067
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   The only way to call it is by calling `buildBeamSqlTable()` on a 
`KafkaTableProvider`, then casting it to correct class, and then calling this 
method. I cannot find any code in Beam that does it. And no code should ever be 
doing that in the first place. My assumption is it was used in tests similar to 
`getBootstrapServers()` in the same class.
   
   +1 to removing the `configUpdates`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256283)
Time Spent: 40m  (was: 0.5h)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at 

[jira] [Work logged] (BEAM-7512) Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7512?focusedWorklogId=256278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256278
 ]

ASF GitHub Bot logged work on BEAM-7512:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:43
Start Date: 07/Jun/19 22:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8796: [BEAM-7512] 
Replace deprecated self.assertEquals with self.assertEqual
URL: https://github.com/apache/beam/pull/8796
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256278)
Time Spent: 40m  (was: 0.5h)

> Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py
> 
>
> Key: BEAM-7512
> URL: https://issues.apache.org/jira/browse/BEAM-7512
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7512) Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py

2019-06-07 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-7512.
---
   Resolution: Fixed
Fix Version/s: 2.14.0

> Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py
> 
>
> Key: BEAM-7512
> URL: https://issues.apache.org/jira/browse/BEAM-7512
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=256273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256273
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:40
Start Date: 07/Jun/19 22:40
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8799: 
[WIP][BEAM-6693] replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8799
 
 
   Replace mmh3 with default python hash function, because it's not reasonable 
to bring this inconvenience to production with only a little bit improvement 
with estimation for a loose estimation algorithm.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256269
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:37
Start Date: 07/Jun/19 22:37
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#discussion_r291773587
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaTable.java
 ##
 @@ -45,26 +46,11 @@
   private List topicPartitions;
   private Map configUpdates;
 
-  protected BeamKafkaTable(Schema beamSchema) {
-super(beamSchema);
-  }
-
   public BeamKafkaTable(Schema beamSchema, String bootstrapServers, 
List topics) {
 super(beamSchema);
 this.bootstrapServers = bootstrapServers;
 this.topics = topics;
-  }
-
-  public BeamKafkaTable(
-  Schema beamSchema, List topicPartitions, String 
bootstrapServers) {
-super(beamSchema);
-this.bootstrapServers = bootstrapServers;
-this.topicPartitions = topicPartitions;
-  }
-
-  public BeamKafkaTable updateConsumerProperties(Map 
configUpdates) {
 
 Review comment:
   This method is public, is it really unused? This takes away the ability to 
set the `withConsumerConfigUpdates` field. If it is really unused lets delete 
`configUpdates` it all together.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256269)
Time Spent: 0.5h  (was: 20m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at 

[jira] [Work logged] (BEAM-6620) Do not relocate guava

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=256268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256268
 ]

ASF GitHub Bot logged work on BEAM-6620:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:36
Start Date: 07/Jun/19 22:36
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8798: [BEAM-6620] Remove 
references to DEFAULT_SHADOW_CLOSURE since it does nothing now.
URL: https://github.com/apache/beam/pull/8798#issuecomment-500060407
 
 
   This is some additional clean-up I wanted to do on the guava removal but 
since all the tests there had passed I didn't want to make too many additional 
changes on top of that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256268)
Time Spent: 5h 40m  (was: 5.5h)

> Do not relocate guava
> -
>
> Key: BEAM-6620
> URL: https://issues.apache.org/jira/browse/BEAM-6620
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ismaël Mejía
>Assignee: Luke Cwik
>Priority: Critical
> Fix For: 2.14.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Once guava use is vendored we have to remove the automatic relocation of 
> guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can 
> work correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6620) Do not relocate guava

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=256267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256267
 ]

ASF GitHub Bot logged work on BEAM-6620:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:35
Start Date: 07/Jun/19 22:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8798: [BEAM-6620] Remove 
references to DEFAULT_SHADOW_CLOSURE since it does nothing now.
URL: https://github.com/apache/beam/pull/8798#issuecomment-500060181
 
 
   R: @Ardagan @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256267)
Time Spent: 5.5h  (was: 5h 20m)

> Do not relocate guava
> -
>
> Key: BEAM-6620
> URL: https://issues.apache.org/jira/browse/BEAM-6620
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ismaël Mejía
>Assignee: Luke Cwik
>Priority: Critical
> Fix For: 2.14.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Once guava use is vendored we have to remove the automatic relocation of 
> guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can 
> work correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6620) Do not relocate guava

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=256265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256265
 ]

ASF GitHub Bot logged work on BEAM-6620:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:35
Start Date: 07/Jun/19 22:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8798: [BEAM-6620] 
Remove references to DEFAULT_SHADOW_CLOSURE since it does nothing now.
URL: https://github.com/apache/beam/pull/8798
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Work logged] (BEAM-6620) Do not relocate guava

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=256266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256266
 ]

ASF GitHub Bot logged work on BEAM-6620:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:35
Start Date: 07/Jun/19 22:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8798: [BEAM-6620] Remove 
references to DEFAULT_SHADOW_CLOSURE since it does nothing now.
URL: https://github.com/apache/beam/pull/8798#issuecomment-500060181
 
 
   R: @Ardagan 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256266)
Time Spent: 5h 20m  (was: 5h 10m)

> Do not relocate guava
> -
>
> Key: BEAM-6620
> URL: https://issues.apache.org/jira/browse/BEAM-6620
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.11.0
>Reporter: Ismaël Mejía
>Assignee: Luke Cwik
>Priority: Critical
> Fix For: 2.14.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Once guava use is vendored we have to remove the automatic relocation of 
> guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can 
> work correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256264
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:31
Start Date: 07/Jun/19 22:31
Worklog Time Spent: 10m 
  Work Description: akedin commented on issue #8797: [BEAM-7511] Fixes the 
bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797#issuecomment-500059284
 
 
   R: @XuMingmin @apilloud @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256264)
Time Spent: 20m  (was: 10m)

> KafkaTable Initialization
> -
>
> Key: BEAM-7511
> URL: https://issues.apache.org/jira/browse/BEAM-7511
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This exception is thrown when a kafka table is created because.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
>  at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
>  at 
> org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
>  at 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
>  at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
>  at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
>  at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
>  at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
>  at 
> org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76)
>  
> This happens because in 
> org.apache.beam.sdk.extensions.sql.meta.provider.kafka, configupdates is not 
> initialized anywhere and the method updateConsumerProperties is never called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256259
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:23
Start Date: 07/Jun/19 22:23
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8054: 
[BEAM-6821] FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256259)
Time Spent: 3h 40m  (was: 3.5h)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=256256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256256
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:16
Start Date: 07/Jun/19 22:16
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8780: 
[BEAM-6693] replace mmh3 with default hash function
URL: https://github.com/apache/beam/pull/8780
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256256)
Time Spent: 11h 10m  (was: 11h)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
> Fix For: 2.14.0
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7511?focusedWorklogId=256254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256254
 ]

ASF GitHub Bot logged work on BEAM-7511:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:15
Start Date: 07/Jun/19 22:15
Worklog Time Spent: 10m 
  Work Description: riazela commented on pull request #8797: [BEAM-7511] 
Fixes the bug in KafkaTable Initialization.
URL: https://github.com/apache/beam/pull/8797
 
 
   Initialized configUpdates to an empty HashMap in the constructor and removed 
unused methods. 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Commented] (BEAM-7101) Remove usages of deprecated assertion self.assertEquals in Python codebase.

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859019#comment-16859019
 ] 

Valentyn Tymofieiev commented on BEAM-7101:
---

https://github.com/apache/beam/pull/8796 takes care of the remaining 
occurrences but we can still investigate strengthening lint rules. We can open 
a new issue for that.

> Remove usages of deprecated assertion self.assertEquals in Python codebase.
> ---
>
> Key: BEAM-7101
> URL: https://issues.apache.org/jira/browse/BEAM-7101
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Elwin Arens
>Priority: Trivial
>  Labels: beginner, easyfix, newbie, starter
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/search?q=self.assertEquals_q=self.assertEquals
> These can be replaced with self.assertEqual.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7101) Remove usages of deprecated assertion self.assertEquals in Python codebase.

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-7101.
-
Resolution: Fixed

> Remove usages of deprecated assertion self.assertEquals in Python codebase.
> ---
>
> Key: BEAM-7101
> URL: https://issues.apache.org/jira/browse/BEAM-7101
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Elwin Arens
>Priority: Trivial
>  Labels: beginner, easyfix, newbie, starter
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/search?q=self.assertEquals_q=self.assertEquals
> These can be replaced with self.assertEqual.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7512) Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7512?focusedWorklogId=256251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256251
 ]

ASF GitHub Bot logged work on BEAM-7512:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:01
Start Date: 07/Jun/19 22:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8796: [BEAM-7512] Replace 
deprecated self.assertEquals with self.assertEqual
URL: https://github.com/apache/beam/pull/8796#issuecomment-500053089
 
 
   cc: @laschmidt 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256251)
Time Spent: 0.5h  (was: 20m)

> Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py
> 
>
> Key: BEAM-7512
> URL: https://issues.apache.org/jira/browse/BEAM-7512
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7512) Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7512?focusedWorklogId=256250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256250
 ]

ASF GitHub Bot logged work on BEAM-7512:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:00
Start Date: 07/Jun/19 22:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8796: [BEAM-7512] Replace 
deprecated self.assertEquals with self.assertEqual
URL: https://github.com/apache/beam/pull/8796#issuecomment-500052779
 
 
   R: @youngoli @tvalentyn 
   CC: @boyuanzz 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256250)
Time Spent: 20m  (was: 10m)

> Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py
> 
>
> Key: BEAM-7512
> URL: https://issues.apache.org/jira/browse/BEAM-7512
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7512) Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7512?focusedWorklogId=256249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256249
 ]

ASF GitHub Bot logged work on BEAM-7512:


Author: ASF GitHub Bot
Created on: 07/Jun/19 22:00
Start Date: 07/Jun/19 22:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8796: [BEAM-7512] 
Replace deprecated self.assertEquals with self.assertEqual
URL: https://github.com/apache/beam/pull/8796
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Commented] (BEAM-7101) Remove usages of deprecated assertion self.assertEquals in Python codebase.

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859014#comment-16859014
 ] 

Valentyn Tymofieiev commented on BEAM-7101:
---

Currently we have usage of self.assertEquals in synthetic_pipeline_test.py, 
combiners_test.py, typehints_test.py

> Remove usages of deprecated assertion self.assertEquals in Python codebase.
> ---
>
> Key: BEAM-7101
> URL: https://issues.apache.org/jira/browse/BEAM-7101
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Elwin Arens
>Priority: Trivial
>  Labels: beginner, easyfix, newbie, starter
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/search?q=self.assertEquals_q=self.assertEquals
> These can be replaced with self.assertEqual.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7101) Remove usages of deprecated assertion self.assertEquals in Python codebase.

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859013#comment-16859013
 ] 

Valentyn Tymofieiev commented on BEAM-7101:
---

Look like we drifted into using deprecated assertions in a few places again. 
We should investigate if our linters can catch this at precommit times. Would 
you be interested to take a look at that [~elwinarens]?

> Remove usages of deprecated assertion self.assertEquals in Python codebase.
> ---
>
> Key: BEAM-7101
> URL: https://issues.apache.org/jira/browse/BEAM-7101
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Elwin Arens
>Priority: Trivial
>  Labels: beginner, easyfix, newbie, starter
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/search?q=self.assertEquals_q=self.assertEquals
> These can be replaced with self.assertEqual.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (BEAM-7101) Remove usages of deprecated assertion self.assertEquals in Python codebase.

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reopened BEAM-7101:
---

> Remove usages of deprecated assertion self.assertEquals in Python codebase.
> ---
>
> Key: BEAM-7101
> URL: https://issues.apache.org/jira/browse/BEAM-7101
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Elwin Arens
>Priority: Trivial
>  Labels: beginner, easyfix, newbie, starter
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/search?q=self.assertEquals_q=self.assertEquals
> These can be replaced with self.assertEqual.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7512) Replace usage of assertEquals with assertEqual in synthetic_pipeline_test.py

2019-06-07 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-7512:
---

 Summary: Replace usage of assertEquals with assertEqual in 
synthetic_pipeline_test.py
 Key: BEAM-7512
 URL: https://issues.apache.org/jira/browse/BEAM-7512
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues

2019-06-07 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859011#comment-16859011
 ] 

Luke Cwik commented on BEAM-7478:
-

Thanks for the full stacktrace. Didn't provide me any additional insights that 
I was hoping for.

> Remote cluster submission from Flink Runner broken due to staging issues
> 
>
> Key: BEAM-7478
> URL: https://issues.apache.org/jira/browse/BEAM-7478
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The usual way to submit pipelines with the Flink Runner is to build a fat jar 
> and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This 
> works fine.
> Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option 
> to specify a remote cluster. Upon submitting an example we get the following 
> at Flink's JobManager.
> {noformat}
> Caused by: java.lang.IllegalAccessError: class 
> sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its 
> superclass sun.reflect.SerializationConstructorAccessorImpl
>   at sun.misc.Unsafe.defineClass(Native Method)
>   at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)
>   at 
> sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)
>   at 
> sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340)
>   at 
> java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420)
>   at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
>   at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
>   at 
> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438)
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288)
>   at 
> org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:63)
>   ... 32 more
> {noformat}
> It appears there is an issue with the staging via {{PipelineResources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6257) Can we deprecate the side input paths through PAssert?

2019-06-07 Thread Kyle Weaver (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859006#comment-16859006
 ] 

Kyle Weaver commented on BEAM-6257:
---

Not totally fixed yet, passert for map and multimap are still using the old 
method. 
[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/PAssert.java#L474]

> Can we deprecate the side input paths through PAssert?
> --
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: starter
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7510?focusedWorklogId=256238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256238
 ]

ASF GitHub Bot logged work on BEAM-7510:


Author: ASF GitHub Bot
Created on: 07/Jun/19 21:39
Start Date: 07/Jun/19 21:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8795: [BEAM-7510] Fixing 
fileio tests with likelihood for flakiness
URL: https://github.com/apache/beam/pull/8795#issuecomment-500047726
 
 
   @tvalentyn it was a silly issue with non-deterministic json serialization.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256238)
Time Spent: 50m  (was: 40m)

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 
> 

[jira] [Created] (BEAM-7511) KafkaTable Initialization

2019-06-07 Thread Alireza Samadianzakaria (JIRA)
Alireza Samadianzakaria created BEAM-7511:
-

 Summary: KafkaTable Initialization
 Key: BEAM-7511
 URL: https://issues.apache.org/jira/browse/BEAM-7511
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Alireza Samadianzakaria
Assignee: Alireza Samadianzakaria


This exception is thrown when a kafka table is created because.

Exception in thread "main" java.lang.NullPointerException
 at 
org.apache.beam.sdk.io.kafka.KafkaIO.updateKafkaProperties(KafkaIO.java:1028)
 at org.apache.beam.sdk.io.kafka.KafkaIO.access$900(KafkaIO.java:251)
 at 
org.apache.beam.sdk.io.kafka.KafkaIO$Read.withConsumerConfigUpdates(KafkaIO.java:814)
 at 
org.apache.beam.sdk.extensions.sql.meta.provider.kafka.BeamKafkaTable.buildIOReader(BeamKafkaTable.java:89)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:67)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel$Transform.expand(BeamIOSourceRel.java:58)
 at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
 at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:488)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:66)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
 at java.util.Iterator.forEachRemaining(Iterator.java:116)
 at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
 at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
 at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.lambda$buildPCollectionList$0(BeamSqlRelUtils.java:47)
 at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
 at java.util.Iterator.forEachRemaining(Iterator.java:116)
 at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
 at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
 at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
 at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.buildPCollectionList(BeamSqlRelUtils.java:48)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:64)
 at 
org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils.toPCollection(BeamSqlRelUtils.java:36)
 at 
org.apache.beam.sdk.extensions.sql.example.MyKafkaExample.main(MyKafkaExample.java:76)

 

This happens because in org.apache.beam.sdk.extensions.sql.meta.provider.kafka, 
configupdates is not initialized anywhere and the method 
updateConsumerProperties is never called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7347) beam_Performance failed with benchmark flag config error

2019-06-07 Thread Mark Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu closed BEAM-7347.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> beam_Performance failed with benchmark flag config error
> 
>
> Key: BEAM-7347
> URL: https://issues.apache.org/jira/browse/BEAM-7347
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mark Liu
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [All|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/] 
> performance benchmarks are affected.
> Error log from [latest beam_PerformanceTests_TextIOIT 
> run|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_TextIOIT/2008/console]:
> {code}
> 00:00:24.372 2019-05-17 00:21:24,724 5d6e9583 MainThread 
> beam_integration_benchmark(1/1) ERRORError during benchmark 
> beam_integration_benchmark
> 00:00:24.372 Traceback (most recent call last):
> 00:00:24.372   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 752, in RunBenchmark
> 00:00:24.372 DoProvisionPhase(spec, detailed_timer)
> 00:00:24.372   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 538, in DoProvisionPhase
> 00:00:24.372 spec.ConstructDpbService()
> 00:00:24.372   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py",
>  line 209, in ConstructDpbService
> 00:00:24.372 self.dpb_service = dpb_service_class(self.config.dpb_service)
> 00:00:24.372   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py",
>  line 53, in __init__
> 00:00:24.372 super(GcpDpbDataflow, self).__init__(dpb_service_spec)
> 00:00:24.372   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/dpb_service.py",
>  line 127, in __init__
> 00:00:24.372 'The flag dpb_service_zone must be provided, for 
> provisioning.')
> 00:00:24.372 InvalidFlagConfigurationError: The flag dpb_service_zone must be 
> provided, for provisioning.
> {code}
> Seems certain change on 
> [PerfkitBenchmarker|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker]
>  breaks our 
> [beam_integration_benchmark|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py].
>  However, we may be able to have a quick fix on Beam side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256211
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:57
Start Date: 07/Jun/19 20:57
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #8745: [BEAM-7407] Adds 
portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#issuecomment-500036255
 
 
   Thanks @tvalentyn 
Added my comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256211)
Time Spent: 3h  (was: 2h 50m)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256210
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:56
Start Date: 07/Jun/19 20:56
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291751460
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -77,11 +77,14 @@ task preCommitPy2() {
   dependsOn "lint"
 }
 
-task portablePreCommit() {
+addPortableWordCountTask('BatchPy2', false)
 
 Review comment:
   We can certainly remove portableWordCountExample but we would like to keep 
portableWordCount
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256210)
Time Spent: 2h 50m  (was: 2h 40m)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256208
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:50
Start Date: 07/Jun/19 20:50
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291750009
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+
+  project.ext.addPortableWordCountTask = { String nameSuffix, boolean 
isStreaming ->
 
 Review comment:
   I have similar feeling but I think in groovy land, implicit is more popular 
because we don't have to change method signature to add new arguments or remove 
arguments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256208)
Time Spent: 2h 40m  (was: 2.5h)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7510?focusedWorklogId=256206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256206
 ]

ASF GitHub Bot logged work on BEAM-7510:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:42
Start Date: 07/Jun/19 20:42
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8792: [BEAM-7510] 
Skip flaky test test_write_to_different_file_types
URL: https://github.com/apache/beam/pull/8792
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256206)
Time Spent: 40m  (was: 0.5h)

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  

[jira] [Work logged] (BEAM-6673) BigQueryIO.Read should automatically produce schemas

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6673?focusedWorklogId=256203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256203
 ]

ASF GitHub Bot logged work on BEAM-6673:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:41
Start Date: 07/Jun/19 20:41
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #8620: [BEAM-6673] Add 
schema support to BigQuery reads
URL: https://github.com/apache/beam/pull/8620#issuecomment-500031482
 
 
   lgtm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256203)
Time Spent: 2h  (was: 1h 50m)

> BigQueryIO.Read should automatically produce schemas
> 
>
> Key: BEAM-6673
> URL: https://issues.apache.org/jira/browse/BEAM-6673
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The output PCollections should contain 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6673) BigQueryIO.Read should automatically produce schemas

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6673?focusedWorklogId=256204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256204
 ]

ASF GitHub Bot logged work on BEAM-6673:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:41
Start Date: 07/Jun/19 20:41
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8620: [BEAM-6673] 
Add schema support to BigQuery reads
URL: https://github.com/apache/beam/pull/8620
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256204)
Time Spent: 2h 10m  (was: 2h)

> BigQueryIO.Read should automatically produce schemas
> 
>
> Key: BEAM-6673
> URL: https://issues.apache.org/jira/browse/BEAM-6673
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The output PCollections should contain 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6673) BigQueryIO.Read should automatically produce schemas

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6673?focusedWorklogId=256197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256197
 ]

ASF GitHub Bot logged work on BEAM-6673:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:28
Start Date: 07/Jun/19 20:28
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8620: [BEAM-6673] 
Add schema support to BigQuery reads
URL: https://github.com/apache/beam/pull/8620#discussion_r291743465
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOReadTest.java
 ##
 @@ -404,6 +407,65 @@ public void processElement(ProcessContext c) throws 
Exception {
 p.run();
   }
 
+  @Test
+  public void testReadTableWithSchema() throws IOException, 
InterruptedException {
+// setup
+Table someTable = new Table();
+someTable.setSchema(
+new TableSchema()
+.setFields(
+ImmutableList.of(
+new TableFieldSchema().setName("name").setType("STRING"),
+new 
TableFieldSchema().setName("number").setType("INTEGER";
 
 Review comment:
   Delaying to a separate PR sounds fine
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256197)
Time Spent: 1h 50m  (was: 1h 40m)

> BigQueryIO.Read should automatically produce schemas
> 
>
> Key: BEAM-6673
> URL: https://issues.apache.org/jira/browse/BEAM-6673
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The output PCollections should contain 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6674) The JdbcIO source should produce schemas

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6674?focusedWorklogId=256196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256196
 ]

ASF GitHub Bot logged work on BEAM-6674:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:26
Start Date: 07/Jun/19 20:26
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #8725: [BEAM-6674] Add 
schema support to JdbcIO read
URL: https://github.com/apache/beam/pull/8725#issuecomment-500027187
 
 
   What you suggested already works via Schema inference. If you have
   PCollection and T has a registered schema, then Beam will infer it
   (assuming that T is not erased for some reason). However to convert the
   Jdbc class to T requires knowing the actual type in the IO.
   
   On Fri, Jun 7, 2019 at 4:42 AM Charith Ellawala 
   wrote:
   
   > *@charithe* commented on this pull request.
   > --
   >
   > In sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java
   > :
   >
   > > @@ -188,6 +191,15 @@
   >  .build();
   >}
   >
   > +  /** Read Beam {@link Row}s from a JDBC data source. */
   > +  @Experimental(Experimental.Kind.SCHEMAS)
   > +  public static ReadRows readRows() {
   > +return new AutoValue_JdbcIO_ReadRows.Builder()
   > +.setFetchSize(DEFAULT_FETCH_SIZE)
   > +.setOutputParallelization(true)
   > +.build();
   > +  }
   >
   > So I assumed that it would be easy to obtain the schema for T and
   > implicitly attach it to the PCollection. However, this appears to be
   > difficult to do reliably unless the user explicitly provides a reference to
   > Class as you correctly suggested.
   >
   > Given that the user has to make a conscious effort to enable schema
   > support by providing a reference to Class, wouldn't it make sense to
   > introduce a generic utility function such as 
PCollections.withSchema(PCollection,
   > Class) that would work for any kind of PCollection and not just the
   > output of JDBC IO?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256196)
Time Spent: 2h 40m  (was: 2.5h)

> The JdbcIO source should produce schemas
> 
>
> Key: BEAM-6674
> URL: https://issues.apache.org/jira/browse/BEAM-6674
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3607) Move checkNotNull arg checks to a new checkArgumentNotNull

2019-06-07 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858964#comment-16858964
 ] 

Kenneth Knowles commented on BEAM-3607:
---

Probably. I don't think there's any good reason to use {{checkNotNull}} as 
Guava provides it.

> Move checkNotNull arg checks to a new checkArgumentNotNull
> --
>
> Key: BEAM-3607
> URL: https://issues.apache.org/jira/browse/BEAM-3607
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-core, sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: beginner, newbie, starter
>
> The simple fact is that {{checkNotNull}} throws NPE which to users looks like 
> a core dump sort of failure. It throws a "500 error" when we usually intend a 
> "400 error", so to speak, so the bugs get filed on the wrong components, or 
> users don't know they passed the wrong thing, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7510?focusedWorklogId=256193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256193
 ]

ASF GitHub Bot logged work on BEAM-7510:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:13
Start Date: 07/Jun/19 20:13
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8792: [BEAM-7510] Skip 
flaky test test_write_to_different_file_types
URL: https://github.com/apache/beam/pull/8792#issuecomment-500023391
 
 
   Thanks @pabloem , what about other tests added in this file, are they 
stable? Perhaps you could run them some large number of times to make sure they 
are not flaky?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256193)
Time Spent: 0.5h  (was: 20m)

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 

[jira] [Work logged] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7510?focusedWorklogId=256191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256191
 ]

ASF GitHub Bot logged work on BEAM-7510:


Author: ASF GitHub Bot
Created on: 07/Jun/19 20:07
Start Date: 07/Jun/19 20:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8792: [BEAM-7510] Skip 
flaky test test_write_to_different_file_types
URL: https://github.com/apache/beam/pull/8792#issuecomment-500021673
 
 
   r: @tvalentyn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256191)
Time Spent: 20m  (was: 10m)

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 
> 

[jira] [Commented] (BEAM-6766) Sort Merge Bucket Join support in Beam

2019-06-07 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858949#comment-16858949
 ] 

Kenneth Knowles commented on BEAM-6766:
---

Wow, nice.

Is there any possibility of breaking off chunks and merging them early or in 
pieces for review? Things tend to move very quickly for <=500 LOC changes, be 
feasible for ~1000 lines. Beyond that, getting a reviewer to do a timely and 
effective review is pretty hard.

> Sort Merge Bucket Join support in Beam
> --
>
> Key: BEAM-6766
> URL: https://issues.apache.org/jira/browse/BEAM-6766
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas, sdk-java-join-library
>Reporter: Claire McGinty
>Assignee: Claire McGinty
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hi! Spotify has been internally prototyping and testing an implementation of 
> the sort merge join using Beam primitives and we're interested in 
> contributing it open-source – probably to Beam's extensions package in its 
> own `smb` module or as part of the joins module?
> We've tested this with Avro files using Avro's GenericDatumWriter/Reader 
> directly (although this could theoretically be expanded to other 
> serialization formats). We'd add two transforms*, an SMB write and an SMB 
> join. 
> SMB write would take in one PCollection and a # of buckets and:
> 1) Apply a partitioning function to the input to assign each record to one 
> bucket. (the user code would have to statically specify a # of buckets... 
> hard to see a way to do this dynamically.)
> 2) Group by that bucket ID and within each bucket perform an in-memory sort 
> on join key. If the grouped records are too large to fit in memory, fall back 
> to an external sort (although if this happens, user should probably increase 
> bucket size so every group fits in memory).
> 3) Directly write the contents of bucket to a sequentially named file.
> 4) Write a metadata file to the same output path with info about hash 
> algorithm/# buckets.
> SMB join would take in the input paths for 2 or more Sources, all of which 
> are written in a bucketed and partitioned way, and :
> 1) Verify that the metadata files have compatible bucket # and hash algorithm.
> 2) Expand the input paths to enumerate the `ResourceIds` of every file in the 
> paths. Group all inputs with the same bucket ID.
> 3) Within each group, open a file reader on all `ResourceIds`. Sequentially 
> read files one record at a time, outputting tuples of all record pairs with 
> matching join key.
>  \* These could be implemented either directly as `PTransforms` with the 
> writer being a `DoFn` but I semantically do like the idea of extending 
> `FileBasedSource`/`Sink` with abstract classes like 
> `SortedBucketSink`/`SortedBucketSource`... if we represent the elements in a 
> sink as KV pairs of >>, so that the # 
> of elements in the PCollection == # of buckets == # of output files, we could 
> just implement something like `SortedBucketSink` extending `FileBasedSink` 
> with a dynamic file naming function. I'd like to be able to take advantage of 
> the existing write/read implementation logic in the `io` package as much as 
> possible although I guess some of those are package private. 
> –
> From our internal testing, we've seen some substantial performance 
> improvements using the right bucket size--not only by avoiding a shuffle 
> during the join step, but also in storage costs, since we're getting better 
> compression in Avro by storing sorted records.
> Please let us know what you think/any concerns we can address! Our 
> implementation isn't quite production-ready yet, but we'd like to start a 
> discussion about it early.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6766) Sort Merge Bucket Join support in Beam

2019-06-07 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-6766:
-

Assignee: Claire McGinty

> Sort Merge Bucket Join support in Beam
> --
>
> Key: BEAM-6766
> URL: https://issues.apache.org/jira/browse/BEAM-6766
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas, sdk-java-join-library
>Reporter: Claire McGinty
>Assignee: Claire McGinty
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hi! Spotify has been internally prototyping and testing an implementation of 
> the sort merge join using Beam primitives and we're interested in 
> contributing it open-source – probably to Beam's extensions package in its 
> own `smb` module or as part of the joins module?
> We've tested this with Avro files using Avro's GenericDatumWriter/Reader 
> directly (although this could theoretically be expanded to other 
> serialization formats). We'd add two transforms*, an SMB write and an SMB 
> join. 
> SMB write would take in one PCollection and a # of buckets and:
> 1) Apply a partitioning function to the input to assign each record to one 
> bucket. (the user code would have to statically specify a # of buckets... 
> hard to see a way to do this dynamically.)
> 2) Group by that bucket ID and within each bucket perform an in-memory sort 
> on join key. If the grouped records are too large to fit in memory, fall back 
> to an external sort (although if this happens, user should probably increase 
> bucket size so every group fits in memory).
> 3) Directly write the contents of bucket to a sequentially named file.
> 4) Write a metadata file to the same output path with info about hash 
> algorithm/# buckets.
> SMB join would take in the input paths for 2 or more Sources, all of which 
> are written in a bucketed and partitioned way, and :
> 1) Verify that the metadata files have compatible bucket # and hash algorithm.
> 2) Expand the input paths to enumerate the `ResourceIds` of every file in the 
> paths. Group all inputs with the same bucket ID.
> 3) Within each group, open a file reader on all `ResourceIds`. Sequentially 
> read files one record at a time, outputting tuples of all record pairs with 
> matching join key.
>  \* These could be implemented either directly as `PTransforms` with the 
> writer being a `DoFn` but I semantically do like the idea of extending 
> `FileBasedSource`/`Sink` with abstract classes like 
> `SortedBucketSink`/`SortedBucketSource`... if we represent the elements in a 
> sink as KV pairs of >>, so that the # 
> of elements in the PCollection == # of buckets == # of output files, we could 
> just implement something like `SortedBucketSink` extending `FileBasedSink` 
> with a dynamic file naming function. I'd like to be able to take advantage of 
> the existing write/read implementation logic in the `io` package as much as 
> possible although I guess some of those are package private. 
> –
> From our internal testing, we've seen some substantial performance 
> improvements using the right bucket size--not only by avoiding a shuffle 
> during the join step, but also in storage costs, since we're getting better 
> compression in Avro by storing sorted records.
> Please let us know what you think/any concerns we can address! Our 
> implementation isn't quite production-ready yet, but we'd like to start a 
> discussion about it early.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7510?focusedWorklogId=256184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256184
 ]

ASF GitHub Bot logged work on BEAM-7510:


Author: ASF GitHub Bot
Created on: 07/Jun/19 19:51
Start Date: 07/Jun/19 19:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8792: [BEAM-7510] 
Skip flaky test test_write_to_different_file_types
URL: https://github.com/apache/beam/pull/8792
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Work logged] (BEAM-6904) Test all Coder structuralValue implementations

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6904?focusedWorklogId=256182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256182
 ]

ASF GitHub Bot logged work on BEAM-6904:


Author: ASF GitHub Bot
Created on: 07/Jun/19 19:48
Start Date: 07/Jun/19 19:48
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8208: [BEAM-6904] Add 
tests for structuralValue implementation in coders
URL: https://github.com/apache/beam/pull/8208#issuecomment-500016028
 
 
   Indeed, we need to roll forward #8258 for this one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256182)
Time Spent: 3h 50m  (was: 3h 40m)

> Test all Coder structuralValue implementations
> --
>
> Key: BEAM-6904
> URL: https://issues.apache.org/jira/browse/BEAM-6904
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Here is a test helper that check that structuralValue is consistent with 
> equals: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L200
> And here is one that tests it another way: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L226
> With the deprecation of consistentWithEquals and implementing all the 
> structualValue methods, we should add these tests to every coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6904) Test all Coder structuralValue implementations

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6904?focusedWorklogId=256181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256181
 ]

ASF GitHub Bot logged work on BEAM-6904:


Author: ASF GitHub Bot
Created on: 07/Jun/19 19:45
Start Date: 07/Jun/19 19:45
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8208: [BEAM-6904] Add 
tests for structuralValue implementation in coders
URL: https://github.com/apache/beam/pull/8208#issuecomment-500015374
 
 
   Ah, and the conflicts are caused by a prior commit being reverted because of 
issues with Dataflow pipeline update.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256181)
Time Spent: 3h 40m  (was: 3.5h)

> Test all Coder structuralValue implementations
> --
>
> Key: BEAM-6904
> URL: https://issues.apache.org/jira/browse/BEAM-6904
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Here is a test helper that check that structuralValue is consistent with 
> equals: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L200
> And here is one that tests it another way: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L226
> With the deprecation of consistentWithEquals and implementing all the 
> structualValue methods, we should add these tests to every coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858937#comment-16858937
 ] 

Valentyn Tymofieiev commented on BEAM-7510:
---

Assigning to [~pabloem] who recently added the test.

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 854, in push
> 10:54:51 response = self.worker.do_instruction(request)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> 10:54:51 request.instruction_id)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> 10:54:51 bundle_processor.process_bundle(instruction_id))
> 10:54:51   File 
> 

[jira] [Commented] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858938#comment-16858938
 ] 

Pablo Estrada commented on BEAM-7510:
-

Sorry about that. I'll skip it and work on a fix.

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 854, in push
> 10:54:51 response = self.worker.do_instruction(request)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> 10:54:51 request.instruction_id)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> 10:54:51 bundle_processor.process_bundle(instruction_id))
> 10:54:51   File 
> 

[jira] [Work logged] (BEAM-6904) Test all Coder structuralValue implementations

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6904?focusedWorklogId=256179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256179
 ]

ASF GitHub Bot logged work on BEAM-6904:


Author: ASF GitHub Bot
Created on: 07/Jun/19 19:42
Start Date: 07/Jun/19 19:42
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8208: [BEAM-6904] Add 
tests for structuralValue implementation in coders
URL: https://github.com/apache/beam/pull/8208#issuecomment-500014568
 
 
   These tests are definitely useful. I want to merge it. I will take a minute 
and see if I can resolve the conflicts. Since it is caused by my delay, please 
allow me and I will push back to this branch if I have perms.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256179)
Time Spent: 3.5h  (was: 3h 20m)

> Test all Coder structuralValue implementations
> --
>
> Key: BEAM-6904
> URL: https://issues.apache.org/jira/browse/BEAM-6904
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Here is a test helper that check that structuralValue is consistent with 
> equals: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L200
> And here is one that tests it another way: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L226
> With the deprecation of consistentWithEquals and implementing all the 
> structualValue methods, we should add these tests to every coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7510:
--
Description: 

{noformat}
10:54:51 ==
10:54:51 ERROR: test_write_to_different_file_types 
(apache_beam.io.fileio_test.WriteFilesTest)
10:54:51 --
10:54:51 Traceback (most recent call last):
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
 line 420, in test_write_to_different_file_types
10:54:51 label='verifyApache')
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 426, in __exit__
10:54:51 self.run().wait_until_finish()
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
 line 107, in run
10:54:51 else test_runner_api))
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 406, in run
10:54:51 self._options).run(False)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 419, in run
10:54:51 return self.runner.run_pipeline(self, self._options)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 128, in run_pipeline
10:54:51 return runner.run_pipeline(pipeline, options)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 289, in run_pipeline
10:54:51 default_environment=self._default_environment))
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 293, in run_via_runner_api
10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 369, in run_stages
10:54:51 stage_context.safe_coders)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 533, in run_stage
10:54:51 data_input, data_output)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1239, in process_bundle
10:54:51 result_future = 
self._controller.control_handler.push(process_bundle)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 854, in push
10:54:51 response = self.worker.do_instruction(request)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction
10:54:51 request.instruction_id)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 368, in process_bundle
10:54:51 bundle_processor.process_bundle(instruction_id))
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 589, in process_bundle
10:54:51 ].process_encoded(data.data)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
10:54:51 self.output(decoded_value)
10:54:51   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/operations.py",
 line 256, in output

[jira] [Commented] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858936#comment-16858936
 ] 

Valentyn Tymofieiev commented on BEAM-7510:
---

More logs available in 
[https://builds.apache.org/job/beam_PreCommit_Python_Commit/6788/consoleFull]

> test_write_to_different_file_types is flaky
> ---
>
> Key: BEAM-7510
> URL: https://issues.apache.org/jira/browse/BEAM-7510
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: Major
>
> {noformat}
> 10:54:51 
> ==
> 10:54:51 ERROR: test_write_to_different_file_types 
> (apache_beam.io.fileio_test.WriteFilesTest)
> 10:54:51 
> --
> 10:54:51 Traceback (most recent call last):
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
>  line 420, in test_write_to_different_file_types
> 10:54:51 label='verifyApache')
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 10:54:51 self.run().wait_until_finish()
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 10:54:51 else test_runner_api))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 10:54:51 self._options).run(False)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 10:54:51 return self.runner.run_pipeline(self, self._options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 10:54:51 return runner.run_pipeline(pipeline, options)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 10:54:51 default_environment=self._default_environment))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 10:54:51 return self.run_stages(*self.create_stages(pipeline_proto))
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 10:54:51 stage_context.safe_coders)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 533, in run_stage
> 10:54:51 data_input, data_output)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1239, in process_bundle
> 10:54:51 result_future = 
> self._controller.control_handler.push(process_bundle)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 854, in push
> 10:54:51 response = self.worker.do_instruction(request)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> 10:54:51 request.instruction_id)
> 10:54:51   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> 10:54:51 bundle_processor.process_bundle(instruction_id))
> 10:54:51   File 
> 

[jira] [Created] (BEAM-7510) test_write_to_different_file_types is flaky

2019-06-07 Thread Valentyn Tymofieiev (JIRA)
Valentyn Tymofieiev created BEAM-7510:
-

 Summary: test_write_to_different_file_types is flaky
 Key: BEAM-7510
 URL: https://issues.apache.org/jira/browse/BEAM-7510
 Project: Beam
  Issue Type: Bug
  Components: io-python-gcp
Reporter: Valentyn Tymofieiev
Assignee: Pablo Estrada


*10:54:51* 
==*10:54:51*
 ERROR: test_write_to_different_file_types 
(apache_beam.io.fileio_test.WriteFilesTest)*10:54:51* 
--*10:54:51*
 Traceback (most recent call last):*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/io/fileio_test.py",
 line 420, in test_write_to_different_file_types*10:54:51* 
label='verifyApache')*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 426, in __exit__*10:54:51* self.run().wait_until_finish()*10:54:51*   
File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
 line 107, in run*10:54:51* else test_runner_api))*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 406, in run*10:54:51* self._options).run(False)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
 line 419, in run*10:54:51* return self.runner.run_pipeline(self, 
self._options)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 128, in run_pipeline*10:54:51* return runner.run_pipeline(pipeline, 
options)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 289, in run_pipeline*10:54:51* 
default_environment=self._default_environment))*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 293, in run_via_runner_api*10:54:51* return 
self.run_stages(*self.create_stages(pipeline_proto))*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 369, in run_stages*10:54:51* stage_context.safe_coders)*10:54:51*   
File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 533, in run_stage*10:54:51* data_input, data_output)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1239, in process_bundle*10:54:51* result_future = 
self._controller.control_handler.push(process_bundle)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 854, in push*10:54:51* response = 
self.worker.do_instruction(request)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction*10:54:51* request.instruction_id)*10:54:51*   
File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 368, in process_bundle*10:54:51* 
bundle_processor.process_bundle(instruction_id))*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 589, in process_bundle*10:54:51* 
].process_encoded(data.data)*10:54:51*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded*10:54:51* 

[jira] [Work logged] (BEAM-6904) Test all Coder structuralValue implementations

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6904?focusedWorklogId=256177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256177
 ]

ASF GitHub Bot logged work on BEAM-6904:


Author: ASF GitHub Bot
Created on: 07/Jun/19 19:41
Start Date: 07/Jun/19 19:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8208: [BEAM-6904] Add 
tests for structuralValue implementation in coders
URL: https://github.com/apache/beam/pull/8208#issuecomment-500014241
 
 
   Ah, my mistake. I forgot to actually "request" my own review so it was not 
showing in my review queue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256177)
Time Spent: 3h 20m  (was: 3h 10m)

> Test all Coder structuralValue implementations
> --
>
> Key: BEAM-6904
> URL: https://issues.apache.org/jira/browse/BEAM-6904
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Alexander Savchenko
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Here is a test helper that check that structuralValue is consistent with 
> equals: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L200
> And here is one that tests it another way: 
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L226
> With the deprecation of consistentWithEquals and implementing all the 
> structualValue methods, we should add these tests to every coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2888) Runner Comparison / Capability Matrix revamp

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2888?focusedWorklogId=256175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256175
 ]

ASF GitHub Bot logged work on BEAM-2888:


Author: ASF GitHub Bot
Created on: 07/Jun/19 19:38
Start Date: 07/Jun/19 19:38
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #8576: 
[BEAM-2888] Add the not-yet-fully-designed drain and checkpoint to runner 
comparison
URL: https://github.com/apache/beam/pull/8576
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256175)
Time Spent: 2h 20m  (was: 2h 10m)

> Runner Comparison / Capability Matrix revamp
> 
>
> Key: BEAM-2888
> URL: https://issues.apache.org/jira/browse/BEAM-2888
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>  Labels: gsod, gsod2019
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Discussion: 
> https://lists.apache.org/thread.html/8aff7d70c254356f2dae3109fb605e0b60763602225a877d3dadf8b7@%3Cdev.beam.apache.org%3E
> Summarizing that discussion, we have a lot of issues/wishes. Some can be 
> addressed as one-off and some need a unified reorganization of the runner 
> comparison.
> Basic corrections:
>  - Remove rows that impossible to not support (ParDo)
>  - Remove rows where "support" doesn't really make sense (Composite 
> transforms)
>  - Deduplicate rows are actually the same model feature (all non-merging 
> windowing / all merging windowing)
>  - Clearly separate rows that represent optimizations (Combine)
>  - Correct rows in the wrong place (Timers are actually a "what...?" row)
>  - Separate or remove rows have not been designed ([Meta]Data driven 
> triggers, retractions)
>  - Rename rows with names that appear no where else (Timestamp control, which 
> is called a TimestampCombiner in Java)
>  - Switch to a more distinct color scheme for full/partial support (currently 
> just solid/faded colors)
>  - Switch to something clearer than "~" for partial support, versus ✘ and ✓ 
> for none and full.
>  - Correct Gearpump support for merging windows (see BEAM-2759)
>  - Correct Spark support for non-merging and merging windows (see BEAM-2499)
> Minor rewrites:
>  - Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row
>  - Make sections as users see them, like "ParDo" / "side Inputs" not "What?" 
> / "side inputs"
>  - Add rows for non-model things, like portability framework support, metrics 
> backends, etc
> Bigger rewrites:
>  - Add versioning to the comparison, as in BEAM-166
>  - Find a way to fit in a plain English summary of runner's support in Beam. 
> It should come first, as it is what new users need before getting to details.
>  - Find a way to describe production readiness of runners and/or testimonials 
> of who is using it in production.
>  - Have a place to compare non-model differences between runners
> Changes requiring engineering efforts:
>  - Gather and add quantitative runner metrics, perhaps Nexmark results for 
> mid-level, smaller benchmarks for measuring aspects of specific features, and 
> larger end-to-end benchmarks to get an idea how it might actually perform on 
> a use case
>  - Tighter coupling of the matrix portion of the comparison with tags on 
> ValidatesRunner tests
> If you care to address some aspect of this, please reach out and/or just file 
> a subtask and address it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7499) ReifyTest.test_window fails in DirectRunner due to 'assign_context.window should not be None.'

2019-06-07 Thread Tanay Tummalapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanay Tummalapalli updated BEAM-7499:
-
Component/s: test-failures

> ReifyTest.test_window fails in DirectRunner due to 'assign_context.window 
> should not be None.'
> --
>
> Key: BEAM-7499
> URL: https://issues.apache.org/jira/browse/BEAM-7499
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> [PR 8717|https://github.com/apache/beam/pull/8717] added 
> ReifyWindow.test_window which fails on the DirectRunner.
> {code:java}
> ERROR:root:Exception at bundle 
> , 
> due to an exception.
>  Traceback (most recent call last):
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 453, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  output_processor.process_outputs(
>  File "apache_beam/runners/common.py", line 915, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>  self.window_fn.assign(assign_context))
>  File "apache_beam/transforms/util.py", line 557, in assign
>  'assign_context.window should not be None. '
> ValueError: assign_context.window should not be None. This might be due to a 
> DoFn returning a TimestampedValue. [while running 'add_timestamps2']
> Traceback (most recent call last):
>  File "apache_beam/transforms/util_test.py", line 501, in test_window
>  assert_that(reified_pc, equal_to(expected), reify_windows=True)
>  File "apache_beam/pipeline.py", line 426, in __exit__
>  self.run().wait_until_finish()
>  File "apache_beam/testing/test_pipeline.py", line 109, in run
>  state = result.wait_until_finish()
>  File "apache_beam/runners/direct/direct_runner.py", line 430, in 
> wait_until_finish
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 400, in await_completion
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 446, in await_completion
>  raise_(t, v, tb)
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  windowed_value, self.process_method(windowed_value.value))
>  File "apache_beam/transforms/core.py", line 1292, in 
>  wrapper = lambda x: [fn(x)]
>  File "apache_beam/testing/util.py", line 129, in _equal
>  'Failed assert: %r == %r' % (sorted_expected, sorted_actual))
> BeamAssertException: Failed assert: [TestWindowedValue(value=('a', 100, 
> GlobalWindow), timestamp=100, windows=[GlobalWindow]), 
> TestWindowedValue(value=('b', 200, GlobalWindow), timestamp=200, 
> windows=[GlobalWindow]), TestWindowedValue(value=('c', 300, GlobalWindow), 
> timestamp=300, windows=[GlobalWindow])] == [TestWindowedValue(value=(('a', 
> 100.0, (GlobalWindow,), PaneInfo(first: True, last: True, timing: 3, index: 
> 0, nonspeculative_index: 0)), 

[jira] [Comment Edited] (BEAM-7499) ReifyTest.test_window fails in DirectRunner due to 'assign_context.window should not be None.'

2019-06-07 Thread Tanay Tummalapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858916#comment-16858916
 ] 

Tanay Tummalapalli edited comment on BEAM-7499 at 6/7/19 7:06 PM:
--

Hi [~lcwik],
Thanks for fixing the test failure. I found the following:

A. The last *BeamAssertExceptiom* is because the actual TestWindowedValue in 
the assert contains PaneInfo along with the window in a tuple. 
This is not the case with the current code in master[1]. I tested the specific 
tests and they work.  I'm not able to reproduce this.
Were any changes made related to ParDo evaluation or PaneInfo, or any PR that 
made use of/modified Reify? 


B. The *ValueError* is caused by `_IdentityWindowFn` that is used in 
`ReshufflePerKey`. 
It's docstring states that it will raise an exception when used after DoFns 
that return TimestampedValue elements. This seems to be caused by a change to 
the test - `test_no_window_context_fails`. This test is expected to raise a 
ValueError as in the master branch[2]. I am yet to investigate why 
_IdentityWindowFn raises an Exception in this case.

I need two more pieces to solve this mystery:
*  Was this caused by a PR?
*  Are the two tracebacks(ValueError, BeamAssertException) related? They seem 
to be caused by two different tests.

[1] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L758
[2] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util_test.py#L267



was (Author: ttanay):
Hi [~lcwik],
Thanks for fixing the test failure. I found the following:

A. The last *Failed Assert* is because the actual TestWindowedValue in the 
assert contains PaneInfo along with the window in a tuple. 
This is not the case with the current code in master[1]. I tested the specific 
tests and they work.  I'm not able to reproduce this.
Were any changes made related to ParDo evaluation or PaneInfo, or any PR that 
made use of/modified Reify? 


B. The ValueError is caused by `_IdentityWindowFn` that is used in 
`ReshufflePerKey`. 
It's docstring states that it will raise an exception when used after DoFns 
that return TimestampedValue elements. This seems to be caused by a change to 
the test - `test_no_window_context_fails`. This test is expected to raise a 
ValueError as in the master branch[2]. I am yet to investigate why 
_IdentityWindowFn raises an Exception in this case.

I need two more pieces to solve this mystery:
*  Was this caused by a PR?
*  Are the two tracebacks(ValueError, BeamAssertException) related? They seem 
to be caused by two different tests.

[1] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L758
[2] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util_test.py#L267


> ReifyTest.test_window fails in DirectRunner due to 'assign_context.window 
> should not be None.'
> --
>
> Key: BEAM-7499
> URL: https://issues.apache.org/jira/browse/BEAM-7499
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> [PR 8717|https://github.com/apache/beam/pull/8717] added 
> ReifyWindow.test_window which fails on the DirectRunner.
> {code:java}
> ERROR:root:Exception at bundle 
> , 
> due to an exception.
>  Traceback (most recent call last):
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 453, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  output_processor.process_outputs(
>  File "apache_beam/runners/common.py", line 915, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>  self.window_fn.assign(assign_context))
>  File "apache_beam/transforms/util.py", line 557, in assign
>  'assign_context.window should 

[jira] [Commented] (BEAM-7499) ReifyTest.test_window fails in DirectRunner due to 'assign_context.window should not be None.'

2019-06-07 Thread Tanay Tummalapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858916#comment-16858916
 ] 

Tanay Tummalapalli commented on BEAM-7499:
--

Hi [~lcwik],
Thanks for fixing the test failure. I found the following:

A. The last *Failed Assert* is because the actual TestWindowedValue in the 
assert contains PaneInfo along with the window in a tuple. 
This is not the case with the current code in master[1]. I tested the specific 
tests and they work.  I'm not able to reproduce this.
Were any changes made related to ParDo evaluation or PaneInfo, or any PR that 
made use of/modified Reify? 


B. The ValueError is caused by `_IdentityWindowFn` that is used in 
`ReshufflePerKey`. 
It's docstring states that it will raise an exception when used after DoFns 
that return TimestampedValue elements. This seems to be caused by a change to 
the test - `test_no_window_context_fails`. This test is expected to raise a 
ValueError as in the master branch[2]. I am yet to investigate why 
_IdentityWindowFn raises an Exception in this case.

I need two more pieces to solve this mystery:
*  Was this caused by a PR?
*  Are the two tracebacks(ValueError, BeamAssertException) related? They seem 
to be caused by two different tests.

[1] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L758
[2] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util_test.py#L267


> ReifyTest.test_window fails in DirectRunner due to 'assign_context.window 
> should not be None.'
> --
>
> Key: BEAM-7499
> URL: https://issues.apache.org/jira/browse/BEAM-7499
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
> [PR 8717|https://github.com/apache/beam/pull/8717] added 
> ReifyWindow.test_window which fails on the DirectRunner.
> {code:java}
> ERROR:root:Exception at bundle 
> , 
> due to an exception.
>  Traceback (most recent call last):
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 453, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  output_processor.process_outputs(
>  File "apache_beam/runners/common.py", line 915, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>  self.window_fn.assign(assign_context))
>  File "apache_beam/transforms/util.py", line 557, in assign
>  'assign_context.window should not be None. '
> ValueError: assign_context.window should not be None. This might be due to a 
> DoFn returning a TimestampedValue. [while running 'add_timestamps2']
> Traceback (most recent call last):
>  File "apache_beam/transforms/util_test.py", line 501, in test_window
>  assert_that(reified_pc, equal_to(expected), reify_windows=True)
>  File "apache_beam/pipeline.py", line 426, in __exit__
>  self.run().wait_until_finish()
>  File "apache_beam/testing/test_pipeline.py", line 109, in run
>  state = result.wait_until_finish()
>  File "apache_beam/runners/direct/direct_runner.py", line 430, in 
> wait_until_finish
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 400, in await_completion
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 446, in await_completion
>  raise_(t, v, tb)
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File 

[jira] [Resolved] (BEAM-7449) Add common pipeline patterns to docs

2019-06-07 Thread Cyrus Maden (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Maden resolved BEAM-7449.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Add common pipeline patterns to docs
> 
>
> Key: BEAM-7449
> URL: https://issues.apache.org/jira/browse/BEAM-7449
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Labels: documentation, pipeline-patterns
> Fix For: Not applicable
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> **I'm currently adding a section called "Common pipeline patterns" to the 
> docs. It'll walk through some real-world use cases with code 
> samples/psuedocode.
> This issue is for the first set of pipeline patterns. It also creates the 
> pipeline-patterns label; use this label to track or contribute new pipeline 
> patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256133
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 18:22
Start Date: 07/Jun/19 18:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-499989207
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256133)
Time Spent: 3h 10m  (was: 3h)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256135
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 18:22
Start Date: 07/Jun/19 18:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-499989379
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256135)
Time Spent: 3.5h  (was: 3h 20m)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256134
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 18:22
Start Date: 07/Jun/19 18:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-499989330
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256134)
Time Spent: 3h 20m  (was: 3h 10m)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7449) Add common pipeline patterns to docs

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7449?focusedWorklogId=256131=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256131
 ]

ASF GitHub Bot logged work on BEAM-7449:


Author: ASF GitHub Bot
Created on: 07/Jun/19 18:12
Start Date: 07/Jun/19 18:12
Worklog Time Spent: 10m 
  Work Description: melap commented on pull request #8732: [BEAM-7449] Add 
pages documenting common Beam pipeline patterns (with code samples)
URL: https://github.com/apache/beam/pull/8732
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256131)
Time Spent: 10m
Remaining Estimate: 0h

> Add common pipeline patterns to docs
> 
>
> Key: BEAM-7449
> URL: https://issues.apache.org/jira/browse/BEAM-7449
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Labels: documentation, pipeline-patterns
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> **I'm currently adding a section called "Common pipeline patterns" to the 
> docs. It'll walk through some real-world use cases with code 
> samples/psuedocode.
> This issue is for the first set of pipeline patterns. It also creates the 
> pipeline-patterns label; use this label to track or contribute new pipeline 
> patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6777) SDK Harness Resilience

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6777?focusedWorklogId=256128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256128
 ]

ASF GitHub Bot logged work on BEAM-6777:


Author: ASF GitHub Bot
Created on: 07/Jun/19 18:07
Start Date: 07/Jun/19 18:07
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #8783: [BEAM-6777] Remove 
the unnecessary enable_health_checker flag
URL: https://github.com/apache/beam/pull/8783#issuecomment-499984405
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256128)
Time Spent: 4h 10m  (was: 4h)

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7449) Add common pipeline patterns to docs

2019-06-07 Thread Cyrus Maden (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Maden updated BEAM-7449:
--
Description: 
**I'm currently adding a section called "Common pipeline patterns" to the docs. 
It'll walk through some real-world use cases with code samples/psuedocode.

This issue is for the first set of pipeline patterns. It also creates the 
pipeline-patterns label; use this label to track or contribute new pipeline 
patterns.

  was:
**I'm currently adding a section called "Common pipeline patterns" to the docs. 
It'll walk through some real-world use cases with code samples/psuedocode.

The purpose of this issue to create a label that track updates to the Beam 
patterns, including new patterns contributors would like to add.


> Add common pipeline patterns to docs
> 
>
> Key: BEAM-7449
> URL: https://issues.apache.org/jira/browse/BEAM-7449
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Major
>  Labels: documentation, pipeline-patterns
>
> **I'm currently adding a section called "Common pipeline patterns" to the 
> docs. It'll walk through some real-world use cases with code 
> samples/psuedocode.
> This issue is for the first set of pipeline patterns. It also creates the 
> pipeline-patterns label; use this label to track or contribute new pipeline 
> patterns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256122
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:53
Start Date: 07/Jun/19 17:53
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-499979501
 
 
   Sorry, @gkovelman, looks like this PR fell off the radar - please ping the 
PR thread if you don't receive a response within couple of days.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256122)
Time Spent: 3h  (was: 2h 50m)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256117
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:51
Start Date: 07/Jun/19 17:51
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-499978686
 
 
   Ping @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256117)
Time Spent: 2h 50m  (was: 2h 40m)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256096
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:42
Start Date: 07/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291685515
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+
+  project.ext.addPortableWordCountTask = { String nameSuffix, boolean 
isStreaming ->
 
 Review comment:
   Good to know, thanks. I find explicit way more reader-friendly. Unless there 
is a reason (long list of optional parameters, etc), I think explicit is better 
than implicit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256096)
Time Spent: 1h 40m  (was: 1.5h)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256101
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:42
Start Date: 07/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291688385
 
 

 ##
 File path: sdks/python/build.gradle
 ##
 @@ -77,11 +77,14 @@ task preCommitPy2() {
   dependsOn "lint"
 }
 
-task portablePreCommit() {
+addPortableWordCountTask('BatchPy2', false)
 
 Review comment:
   We can, however, the portableWordCountExample definition would be more 
complicated. Do we meed that task at all? how is it used? 
   I can remove it if you prefer and remove one of the prefix parameter.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256101)
Time Spent: 2.5h  (was: 2h 20m)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256100
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:42
Start Date: 07/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291681257
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -87,7 +88,22 @@ def default_docker_image():
 if 'USER' in os.environ:
   # Perhaps also test if this was built?
   logging.info('Using latest locally built Python SDK docker image.')
-  return os.environ['USER'] + 
'-docker-apache.bintray.io/beam/python:latest'
+  if sys.version_info[0] == 2:
+version_suffix = ''
+  elif sys.version_info[0:2] == (3, 5):
+version_suffix = '3'
+  else:
+version_suffix = '3'
+# TODO(BEAM-7474): Use an image which has correct Python minor version.
+logging.warning('Make sure that locally built Python SDK docker image '
+'has Python %d.%d interpreter. See also: BEAM-7474.' % 
(
+sys.version_info[0], sys.version_info[1]))
+
+  return ('{user}-docker-apache.bintray.io/beam/python'
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256100)
Time Spent: 2h 20m  (was: 2h 10m)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256098
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:42
Start Date: 07/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291681410
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+
+  project.ext.addPortableWordCountTask = { String nameSuffix, boolean 
isStreaming ->
+
+project.task('portableWordCount' + nameSuffix) {
+  dependsOn 'installGcpTest'
+  dependsOn = ['installGcpTest']
 
 Review comment:
   good catch.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256098)
Time Spent: 2h  (was: 1h 50m)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256099
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:42
Start Date: 07/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291679018
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -87,7 +88,22 @@ def default_docker_image():
 if 'USER' in os.environ:
   # Perhaps also test if this was built?
   logging.info('Using latest locally built Python SDK docker image.')
-  return os.environ['USER'] + 
'-docker-apache.bintray.io/beam/python:latest'
+  if sys.version_info[0] == 2:
+version_suffix = ''
+  elif sys.version_info[0:2] == (3, 5):
+version_suffix = '3'
+  else:
+version_suffix = '3'
+# TODO(BEAM-7474): Use an image which has correct Python minor version.
+logging.warning('Make sure that locally built Python SDK docker image '
 
 Review comment:
   We don't have a release process for this image yet, so we build it locally 
when we run the test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256099)
Time Spent: 2h 10m  (was: 2h)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7407) Create a Wordcount-on-Flink Python 3 test suite.

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7407?focusedWorklogId=256097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256097
 ]

ASF GitHub Bot logged work on BEAM-7407:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:42
Start Date: 07/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8745: [BEAM-7407] 
Adds portable wordcount test for Python 3 with Flink runner.
URL: https://github.com/apache/beam/pull/8745#discussion_r291686090
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1815,6 +1815,58 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+
+  project.ext.addPortableWordCountTask = { String nameSuffix, boolean 
isStreaming ->
 
 Review comment:
   Yes, it is my concern as well, this is being discussed on the mailing list, 
let's continue the discussion on 
https://lists.apache.org/thread.html/7c48200678c570c6e42d4933f8ea4f9324d82f7ae720b591a93ffcda@%3Cdev.beam.apache.org%3E.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256097)
Time Spent: 1h 50m  (was: 1h 40m)

> Create a Wordcount-on-Flink Python 3 test suite.
> 
>
> Key: BEAM-7407
> URL: https://issues.apache.org/jira/browse/BEAM-7407
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7509) [test] dependencies are used in testing/ directory and imported in non-test code

2019-06-07 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-7509:
---

 Summary: [test] dependencies are used in testing/ directory and 
imported in non-test code
 Key: BEAM-7509
 URL: https://issues.apache.org/jira/browse/BEAM-7509
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Pablo Estrada


This is causing breakages when running some non-test code for Beam (e.g. 
[https://github.com/apache/beam/pull/8778]).

 

This was surfaced internally at google because testing VMs always install all 
testing dependencies for Beam.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6821) FileBasedSink is not creating file paths according to target filesystem

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6821?focusedWorklogId=256088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256088
 ]

ASF GitHub Bot logged work on BEAM-6821:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:23
Start Date: 07/Jun/19 17:23
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8054: [BEAM-6821] 
FileBasedSink improper paths
URL: https://github.com/apache/beam/pull/8054#issuecomment-499969250
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256088)
Time Spent: 2h 40m  (was: 2.5h)

> FileBasedSink is not creating file paths according to target filesystem
> ---
>
> Key: BEAM-6821
> URL: https://issues.apache.org/jira/browse/BEAM-6821
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Gregory Kovelman
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> File path generated in _open_writer_ method is not according to target 
> filesystem, because
> os.path.join is used and not FileSystems.join.
> apache_beam\io\filebasedsink.py extract:
>  
> {code:java}
> def _create_temp_dir(self, file_path_prefix):
>  base_path, last_component = FileSystems.split(file_path_prefix)
>  if not last_component:
># Trying to re-split the base_path to check if it's a root.
>new_base_path, _ = FileSystems.split(base_path)
>if base_path == new_base_path:
>  raise ValueError('Cannot create a temporary directory for root path '
>   'prefix %s. Please specify a file path prefix with '
>   'at least two components.' % file_path_prefix)
>  path_components = [base_path,
> 'beam-temp-' + last_component + '-' + uuid.uuid1().hex]
>  return FileSystems.join(*path_components)
> @check_accessible(['file_path_prefix', 'file_name_suffix'])
>  def open_writer(self, init_result, uid):
>  # A proper suffix is needed for AUTO compression detection.
>  # We also ensure there will be no collisions with uid and a
>  # (possibly unsharded) file_path_prefix and a (possibly empty)
>  # file_name_suffix.
>  file_path_prefix = self.file_path_prefix.get()
>  file_name_suffix = self.file_name_suffix.get()
>  suffix = (
> '.' + os.path.basename(file_path_prefix) + file_name_suffix)
>  return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> {code}
>  
>  
> This created incompatibilities between, for example, Windows and GCS.
> Expected: gs://bucket/beam-temp-result-uuid\\uid.result
> Actual: gs://bucket/beam-temp-result-uuid/uid.result
> Replacing os.path.join with FileSystems.join fixes the issue



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=256078=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256078
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 07/Jun/19 17:11
Start Date: 07/Jun/19 17:11
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8741: [BEAM-7428] 
Output the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-499965273
 
 
   So I understand why the output timestamp is -inf. That is just how 
BoundedSource works; it always outputs everything at -inf so that it is safe to 
move forward to anywhere you want.
   
   So you can't run a bounded source and use those timestamps if the input 
element has a timestamp later than -inf. But you can just output at the same 
timestamp as the input if you want, I guess. I think this is an area of SDF 
that I talked about with Eugene and the conclusion was that we might need 
per-key watermarks or something crazy. I don't think it got solved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256078)
Time Spent: 3h  (was: 2h 50m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7474) Add SDK harness containers for Py 3.6, Py 3.7

2019-06-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7474:
--
Status: Open  (was: Triage Needed)

> Add SDK harness containers for Py 3.6, Py 3.7
> -
>
> Key: BEAM-7474
> URL: https://issues.apache.org/jira/browse/BEAM-7474
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Currently we can build a Py3-compatible container image with gradle by 
> running:
> ./gradlew  :sdks:python:container:py3:docker 
> This builds a docker container image like: 
> valentyn-docker-apache.bintray.io/beam/python3 
> The code for this is defined in: 
> https://github.com/apache/beam/blob/ae60a72b03f3a2b6b2a06667ec1868a7acc8e38f/sdks/python/container/py3/build.gradle#L48
> To support portable runners that use a container (e.g. Flink) on multiple 
> versions of Python 3,  we should make it possible to build Python 
> 3-compatible SDK harness containers bundled with any desired python version. 
> We could have several gradle projects:
>   :sdks:python:container:py35:docker
>   :sdks:python:container:py36:docker
>   :sdks:python:container:py37:docker
> and several Dockerfiles to support this:
>  
>   sdks/python/container/py35/Dockerfile
>   sdks/python/container/py36/Dockerfile
>   sdks/python/container/py37/Dockerfile
> The only difference right now would be the base image used in FROM field in 
> Dockerfile. 
> Alternatively, we could have one parameterized Dockerfile that starts with :
> {code}
> ARG BASE_IMAGE
> FROM $BASE_IMAGE
> ...
> {code}
> I think the latter approach, may result in complications later if these 
> containers will need to diverge down the road.
> cc'ing a few folks who may have some feedback on this: [~angoenka] [~mxm] 
> [~robertwb] [~Juta] [~frederik].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=256062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256062
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 07/Jun/19 16:53
Start Date: 07/Jun/19 16:53
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8741: [BEAM-7428] 
Output the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-499959566
 
 
   Sorry for the delay. Taking another look to refresh myself.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256062)
Time Spent: 2h 50m  (was: 2h 40m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7508) Expose the current key to the CombineFn

2019-06-07 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-7508:

Description: 
One can not currently access the key during combining. Having access to the key 
can be useful during various phases.

There is a less then ideal workaround where the user embeds the key within the 
value turning the combiner over ** into a combiner over *>*.

 

Originally asked in 
[https://stackoverflow.com/questions/56451796/how-can-i-access-the-key-in-subclass-of-combinerfn-when-combining-a-pcollection]

  was:
One can not currently access the key during combining. Having access to the key 
can be useful during various phases.

There is a less then ideal workaround where the user embeds the key within the 
value turning the combiner over ** into a combiner over *>*.


> Expose the current key to the CombineFn
> ---
>
> Key: BEAM-7508
> URL: https://issues.apache.org/jira/browse/BEAM-7508
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-ideas
>Reporter: Luke Cwik
>Priority: Major
>
> One can not currently access the key during combining. Having access to the 
> key can be useful during various phases.
> There is a less then ideal workaround where the user embeds the key within 
> the value turning the combiner over ** into a combiner over * V>>*.
>  
> Originally asked in 
> [https://stackoverflow.com/questions/56451796/how-can-i-access-the-key-in-subclass-of-combinerfn-when-combining-a-pcollection]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7508) Expose the current key to the CombineFn

2019-06-07 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-7508:
---

 Summary: Expose the current key to the CombineFn
 Key: BEAM-7508
 URL: https://issues.apache.org/jira/browse/BEAM-7508
 Project: Beam
  Issue Type: Improvement
  Components: sdk-ideas
Reporter: Luke Cwik


One can not currently access the key during combining. Having access to the key 
can be useful during various phases.

There is a less then ideal workaround where the user embeds the key within the 
value turning the combiner over ** into a combiner over *>*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7507) StreamingDataflowWorker attempts to decode non-utf8 binary data as utf8

2019-06-07 Thread Luke Cwik (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-7507.
---
   Resolution: Fixed
Fix Version/s: 2.14.0

> StreamingDataflowWorker attempts to decode non-utf8 binary data as utf8
> ---
>
> Key: BEAM-7507
> URL: https://issues.apache.org/jira/browse/BEAM-7507
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Trivial
> Fix For: 2.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In StreamingDataflowWorker, when setting the workId for the logging MDC, 
> currently the code attempts to decode a binary blob as utf8.  This will 
> (generally) cause a decoding error, and the java UTF8 decoder uses a globally 
> synchronized cache to handle generating these error results.
> A simple solution would be to use the protobuf TextFormat escaping to convert 
> the ByteString key into a string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7507) StreamingDataflowWorker attempts to decode non-utf8 binary data as utf8

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7507?focusedWorklogId=256026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256026
 ]

ASF GitHub Bot logged work on BEAM-7507:


Author: ASF GitHub Bot
Created on: 07/Jun/19 16:19
Start Date: 07/Jun/19 16:19
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8789: [BEAM-7507] 
Use TextFormat to escape workIds to a string rather than utf8
URL: https://github.com/apache/beam/pull/8789
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256026)
Time Spent: 20m  (was: 10m)

> StreamingDataflowWorker attempts to decode non-utf8 binary data as utf8
> ---
>
> Key: BEAM-7507
> URL: https://issues.apache.org/jira/browse/BEAM-7507
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Trivial
> Fix For: 2.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In StreamingDataflowWorker, when setting the workId for the logging MDC, 
> currently the code attempts to decode a binary blob as utf8.  This will 
> (generally) cause a decoding error, and the java UTF8 decoder uses a globally 
> synchronized cache to handle generating these error results.
> A simple solution would be to use the protobuf TextFormat escaping to convert 
> the ByteString key into a string.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7368) Run Python GBK load tests on portable Flink runner

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7368?focusedWorklogId=256024=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256024
 ]

ASF GitHub Bot logged work on BEAM-7368:


Author: ASF GitHub Bot
Created on: 07/Jun/19 16:16
Start Date: 07/Jun/19 16:16
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #8636: [BEAM-7368] 
Flink + Python + gbk load test
URL: https://github.com/apache/beam/pull/8636#discussion_r291661814
 
 

 ##
 File path: .test-infra/dataproc/create_flink_cluster.sh
 ##
 @@ -57,6 +59,10 @@ 
DOCKER_INIT="$GCS_BUCKET/$INIT_ACTIONS_FOLDER_NAME/docker.sh"
 
 # Flink properties
 FLINK_LOCAL_PORT=8081
+
+# By default each taskmanager has one slot - use that value to avoid sharing 
SDK Harness by multiple tasks.
+FLINK_TASKMANAGER_SLOTS="${FLINK_TASKMANAGER_SLOTS:=1}"
 
 Review comment:
   suspected line I'm talking about: 
https://github.com/apache/beam/blob/5ee4cf4e4880782492ec26f2b454a6df9b25f1e2/.test-infra/dataproc/init-actions/flink.sh#L112
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256024)
Time Spent: 9h 50m  (was: 9h 40m)

> Run Python GBK load tests on portable Flink runner
> --
>
> Key: BEAM-7368
> URL: https://issues.apache.org/jira/browse/BEAM-7368
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7368) Run Python GBK load tests on portable Flink runner

2019-06-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7368?focusedWorklogId=256021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-256021
 ]

ASF GitHub Bot logged work on BEAM-7368:


Author: ASF GitHub Bot
Created on: 07/Jun/19 16:15
Start Date: 07/Jun/19 16:15
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #8636: [BEAM-7368] 
Flink + Python + gbk load test
URL: https://github.com/apache/beam/pull/8636#discussion_r291661503
 
 

 ##
 File path: .test-infra/dataproc/init-actions/flink.sh
 ##
 @@ -121,7 +123,17 @@ function configure_flink() {
 grep 'spark\.executor\.cores' /etc/spark/conf/spark-defaults.conf \
   | tail -n1 \
   | cut -d'=' -f2)
-  local flink_taskmanager_slots="$(($spark_executor_cores * 2))"
 
 Review comment:
   Not sure - this is taken from the original init action - I will investigate 
that too. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 256021)
Time Spent: 9.5h  (was: 9h 20m)

> Run Python GBK load tests on portable Flink runner
> --
>
> Key: BEAM-7368
> URL: https://issues.apache.org/jira/browse/BEAM-7368
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >