[jira] [Commented] (BEAM-6987) TypeHints Py3 Error: Typehints NativeTypesTest fails on Python 3.7+

2019-05-21 Thread niklas Hansson (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845541#comment-16845541
 ] 

niklas Hansson commented on BEAM-6987:
--

[~udim] as far as i can see. Your PR solved this as well? Do you see any 
issues? Otherwise I close it :)

> TypeHints Py3 Error: Typehints NativeTypesTest fails on Python 3.7+
> ---
>
> Key: BEAM-6987
> URL: https://issues.apache.org/jira/browse/BEAM-6987
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: niklas Hansson
>Priority: Major
>
> The following tests are failing:
>  * test_bad_main_input 
> (apache_beam.typehints.typed_pipeline_test.NativeTypesTest)
>  * test_bad_main_output 
> (apache_beam.typehints.typed_pipeline_test.NativeTypesTest)
>  * test_good_main_input 
> (apache_beam.typehints.typed_pipeline_test.NativeTypesTest)
> With the following error:
> {noformat}
> Traceback (most recent call last):
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py",
>  line 137, in test_bad_main_output
>  [(5, 4), (3, 2)] | beam.Map(munge) | 'Again' >> beam.Map(munge)
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py",
>  line 510, in _ror_
>  result = p.apply(self, pvalueish, label)
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 514, in apply
>  transform.type_check_inputs(pvalueish)
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py",
>  line 760, in type_check_inputs
>  bindings.get(arg, typehints.Any), hint):
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typehints.py",
>  line 1131, in is_consistent_with
>  return base.consistent_with_check(sub)
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typehints.py",
>  line 135, in consistent_with_check
>  raise NotImplementedError
>  NotImplementedError{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6985) TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+

2019-05-21 Thread niklas Hansson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niklas Hansson closed BEAM-6985.

   Resolution: Fixed
Fix Version/s: 2.14.0

Solved by https://github.com/apache/beam/pull/8590

> TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
> 
>
> Key: BEAM-6985
> URL: https://issues.apache.org/jira/browse/BEAM-6985
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: niklas Hansson
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The following tests are failing:
> * test_convert_nested_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_types 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
> With similar errors, where `typing. != `. eg:
> {noformat}
>  FAIL: test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  --
>  Traceback (most recent call last):
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/native_type_compatibility_test.py",
>  line 79, in test_convert_to_beam_type
>  beam_type, description)
>  AssertionError: typing.Dict[bytes, int] != Dict[bytes, int] : simple dict
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6985) TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6985?focusedWorklogId=246585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246585
 ]

ASF GitHub Bot logged work on BEAM-6985:


Author: ASF GitHub Bot
Created on: 22/May/19 05:38
Start Date: 22/May/19 05:38
Worklog Time Spent: 10m 
  Work Description: NikeNano commented on issue #8453: [BEAM-6985] 
TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ Updates
URL: https://github.com/apache/beam/pull/8453#issuecomment-494659396
 
 
   > @NikeNano should we close this PR since #8590 is merged?
   
   Yes :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246585)
Time Spent: 5h 50m  (was: 5h 40m)

> TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
> 
>
> Key: BEAM-6985
> URL: https://issues.apache.org/jira/browse/BEAM-6985
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: niklas Hansson
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The following tests are failing:
> * test_convert_nested_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_types 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
> With similar errors, where `typing. != `. eg:
> {noformat}
>  FAIL: test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  --
>  Traceback (most recent call last):
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/native_type_compatibility_test.py",
>  line 79, in test_convert_to_beam_type
>  beam_type, description)
>  AssertionError: typing.Dict[bytes, int] != Dict[bytes, int] : simple dict
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6985) TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6985?focusedWorklogId=246586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246586
 ]

ASF GitHub Bot logged work on BEAM-6985:


Author: ASF GitHub Bot
Created on: 22/May/19 05:38
Start Date: 22/May/19 05:38
Worklog Time Spent: 10m 
  Work Description: NikeNano commented on pull request #8453: [BEAM-6985] 
TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ Updates
URL: https://github.com/apache/beam/pull/8453
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246586)
Time Spent: 6h  (was: 5h 50m)

> TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
> 
>
> Key: BEAM-6985
> URL: https://issues.apache.org/jira/browse/BEAM-6985
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: niklas Hansson
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The following tests are failing:
> * test_convert_nested_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  
> * test_convert_to_beam_types 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
> With similar errors, where `typing. != `. eg:
> {noformat}
>  FAIL: test_convert_to_beam_type 
> (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest)
>  --
>  Traceback (most recent call last):
>  File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/native_type_compatibility_test.py",
>  line 79, in test_convert_to_beam_type
>  beam_type, description)
>  AssertionError: typing.Dict[bytes, int] != Dict[bytes, int] : simple dict
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7365) apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7365:
--
Fix Version/s: (was: 2.13.0)

> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive 
> is very slow
> -
>
> Key: BEAM-7365
> URL: https://issues.apache.org/jira/browse/BEAM-7365
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-python-avro
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestFastAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestFastAvro) ... WARNING:root:After 101 
> concurrent splitting trials at item #2, observed only failure, giving up on 
> this item
> WARNING:root:After 101 concurrent splitting trials at item #21, observed only 
> failure, giving up on this item
> WARNING:root:After 101 concurrent splitting trials at item #22, observed only 
> failure, giving up on this item
> WARNING:root:After 1014 total concurrent splitting trials, considered only 25 
> items, giving up.
> ok
> --
> Ran 1 test in 172.223s
>  
> {noformat}
> Compare this with 
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestAvro) ... ok
> --
> Ran 1 test in 0.623s
> OK
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7365) apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7365:
--
Priority: Major  (was: Blocker)

> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive 
> is very slow
> -
>
> Key: BEAM-7365
> URL: https://issues.apache.org/jira/browse/BEAM-7365
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-python-avro
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestFastAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestFastAvro) ... WARNING:root:After 101 
> concurrent splitting trials at item #2, observed only failure, giving up on 
> this item
> WARNING:root:After 101 concurrent splitting trials at item #21, observed only 
> failure, giving up on this item
> WARNING:root:After 101 concurrent splitting trials at item #22, observed only 
> failure, giving up on this item
> WARNING:root:After 1014 total concurrent splitting trials, considered only 25 
> items, giving up.
> ok
> --
> Ran 1 test in 172.223s
>  
> {noformat}
> Compare this with 
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestAvro) ... ok
> --
> Ran 1 test in 0.623s
> OK
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7365) apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845504#comment-16845504
 ] 

Valentyn Tymofieiev commented on BEAM-7365:
---

The problem is with the test. This is not a release blocker. 
https://github.com/apache/beam/pull/8646 out to fix the rootcause.

> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive 
> is very slow
> -
>
> Key: BEAM-7365
> URL: https://issues.apache.org/jira/browse/BEAM-7365
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-python-avro
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestFastAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestFastAvro) ... WARNING:root:After 101 
> concurrent splitting trials at item #2, observed only failure, giving up on 
> this item
> WARNING:root:After 101 concurrent splitting trials at item #21, observed only 
> failure, giving up on this item
> WARNING:root:After 101 concurrent splitting trials at item #22, observed only 
> failure, giving up on this item
> WARNING:root:After 1014 total concurrent splitting trials, considered only 25 
> items, giving up.
> ok
> --
> Ran 1 test in 172.223s
>  
> {noformat}
> Compare this with 
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestAvro) ... ok
> --
> Ran 1 test in 0.623s
> OK
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7365) apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7365?focusedWorklogId=246573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246573
 ]

ASF GitHub Bot logged work on BEAM-7365:


Author: ASF GitHub Bot
Created on: 22/May/19 04:31
Start Date: 22/May/19 04:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8646: [BEAM-7365] Reduces 
the volume of test data in fastavro branch of 
test_dynamic_work_rebalancing_exhaustive to match the volume of avro branch.
URL: https://github.com/apache/beam/pull/8646#issuecomment-494648129
 
 
   R: @robertwb, @chamikaramj 
   cc: @fredo838 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246573)
Time Spent: 20m  (was: 10m)

> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive 
> is very slow
> -
>
> Key: BEAM-7365
> URL: https://issues.apache.org/jira/browse/BEAM-7365
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-python-avro
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestFastAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestFastAvro) ... WARNING:root:After 101 
> concurrent splitting trials at item #2, observed only failure, giving up on 
> this item
> WARNING:root:After 101 concurrent splitting trials at item #21, observed only 
> failure, giving up on this item
> WARNING:root:After 101 concurrent splitting trials at item #22, observed only 
> failure, giving up on this item
> WARNING:root:After 1014 total concurrent splitting trials, considered only 25 
> items, giving up.
> ok
> --
> Ran 1 test in 172.223s
>  
> {noformat}
> Compare this with 
> {noformat}
> $ python setup.py test -s 
> apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive
> test_dynamic_work_rebalancing_exhaustive 
> (apache_beam.io.avroio_test.TestAvro) ... ok
> --
> Ran 1 test in 0.623s
> OK
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7365) apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7365?focusedWorklogId=246572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246572
 ]

ASF GitHub Bot logged work on BEAM-7365:


Author: ASF GitHub Bot
Created on: 22/May/19 04:30
Start Date: 22/May/19 04:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8646: [BEAM-7365] 
Reduces the volume of test data in fastavro branch of 
test_dynamic_work_rebalancing_exhaustive to match the volume of avro branch.
URL: https://github.com/apache/beam/pull/8646
 
 
   Context: FastAvro flavor of test_dynamic_work_rebalancing_exhaustive test 
currently uses 40x bigger input than Avro flavor of the same test, causing 
slowness during test execution.   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on 

[jira] [Commented] (BEAM-6813) Issues with state + timers in java Direct Runner

2019-05-21 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845499#comment-16845499
 ] 

Kenneth Knowles commented on BEAM-6813:
---

[~kedin] looked at something similar, so pinging here. I wonder if we can catch 
the issue.

> Issues with state + timers in java Direct Runner 
> -
>
> Key: BEAM-6813
> URL: https://issues.apache.org/jira/browse/BEAM-6813
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.11.0
>Reporter: Steve Niemitz
>Priority: Major
>
> I was experimenting with a stateful DoFn with timers, and ran into a weird 
> bug where a state cell I was writing to would come back as null when I read 
> it inside a timer callback.
> I've attached the code below [1] (please excuse the scala ;) ).
> After I dug into this a little bit, I found that the state's value was 
> present in the `underlying` table in CopyOnAccessMemoryStateTable [2], but 
> not set in the `stateTable` itself on the instance. [3]   Based on my very 
> rudimentary understanding of how this works in the direct runner, it seems 
> like commit() is not being called on the state table before the timer is 
> firing?
>   
>  [1]
> {code:java}
> private final class AggregatorDoFn[K, V, Acc, Out](
>   combiner: CombineFn[V, Acc, Out],
>   keyCoder: Coder[K],
>   accumulatorCoder: Coder[Acc]
> ) extends DoFn[KV[K, V], KV[K, Out]] {
>   @StateId(KeyId)
>   private final val keySpec = StateSpecs.value(keyCoder)
>   @StateId(AggregationId)
>   private final val stateSpec = StateSpecs.combining(accumulatorCoder, 
> combiner)
>   @StateId("numElements")
>   private final val numElementsSpec = StateSpecs.combining(Sum.ofLongs())
>   @TimerId(FlushTimerId)
>   private final val flushTimerSpec = 
> TimerSpecs.timer(TimeDomain.PROCESSING_TIME)
>   @ProcessElement
>   def processElement(
> @StateId(KeyId) key: ValueState[K],
> @StateId(AggregationId) state: CombiningState[V, Acc, Out],
> @StateId("numElements") numElements: CombiningState[JLong, _, JLong],
> @TimerId(FlushTimerId) flushTimer: Timer,
> @Element element: KV[K, V],
> window: BoundedWindow
>   ): Unit = {
> key.write(element.getKey)
> state.add(element.getValue)
> numElements.add(1L)
> if (numElements.read() == 1) {
>   flushTimer
> .offset(Duration.standardSeconds(10))
> .setRelative()
> }
>   }
>   @OnTimer(FlushTimerId)
>   def onFlushTimer(
> @StateId(KeyId) key: ValueState[K],
> @StateId(AggregationId) state: CombiningState[V, _, Out],
> @StateId("numElements") numElements: CombiningState[JLong, _, JLong],
> output: OutputReceiver[KV[K, Out]]
>   ): Unit = {
> if (numElements.read() > 0) {
>   val k = key.read()
>   output.output(
> KV.of(k, state.read())
>   )
> }
> numElements.clear()
>   }
> }{code}
> [2]
>  [https://imgur.com/a/xvPR5nd]
> [3]
>  [https://imgur.com/a/jznMdaQ]
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7322) PubSubIO watermark does not advance for very low volumes

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7322?focusedWorklogId=246560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246560
 ]

ASF GitHub Bot logged work on BEAM-7322:


Author: ASF GitHub Bot
Created on: 22/May/19 03:32
Start Date: 22/May/19 03:32
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #8598: [BEAM-7322] 
Add threshold to PubSub unbounded source
URL: https://github.com/apache/beam/pull/8598#discussion_r286301013
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java
 ##
 @@ -973,6 +980,24 @@ public Instant getWatermark() {
   return new Instant(lastWatermarkMsSinceEpoch);
 }
 
+/**
+ * In case of streams with low traffic, {@link MovingFunction} could never 
get enough samples in
+ * {@link PubsubUnboundedSource#SAMPLE_PERIOD} to move watermark. To 
prevent this situation, we
+ * need to check if watermark is stale (it was not updated during {@link
+ * PubsubUnboundedSource#UPDATE_THRESHOLD}) and force its update if it is.
+ *
+ * @param nowMsSinceEpoch - current timestamp
+ * @return should the watermark be updated
+ */
+private boolean shouldUpdate(long nowMsSinceEpoch) {
+  boolean hasEnoughSamples =
+  minReadTimestampMsSinceEpoch.isSignificant()
+  || minUnreadTimestampMsSinceEpoch.isSignificant();
+  boolean isStale =
+  lastWatermarkMsSinceEpoch < (nowMsSinceEpoch - 
UPDATE_THRESHOLD.getMillis());
 
 Review comment:
   This check doesn't test whether the watermark hasn't been updated, it's 
actually testing whether the watermark is old. However the watermark might be 
old because the data is old (maybe the pipeline was started on old data and is 
catching up). I think you actually want to store the current time whenever you 
allow the watermark to advance, and ensure that you never stop it for too long.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246560)
Time Spent: 1h  (was: 50m)

> PubSubIO watermark does not advance for very low volumes
> 
>
> Key: BEAM-7322
> URL: https://issues.apache.org/jira/browse/BEAM-7322
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Tim Sell
>Priority: Minor
> Attachments: data.json
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I have identified an issue where the watermark does not advance when using 
> the beam PubSubIO when volumes are very low.
> I have created a mini example project to demonstrate the behaviour with a 
> python script for generating messages at different frequencies:
> https://github.com/tims/beam/tree/pubsub-watermark-example/pubsub-watermark 
> [note: this is in a directory of a Beam fork for corp hoop jumping 
> convenience on my end, it is not intended for merging].
> The behaviour is easily replicated if you apply a fixed window triggering 
> after the watermark passes the end of the window.
> {code}
> pipeline.apply(PubsubIO.readStrings().fromSubscription(subscription))
> .apply(ParDo.of(new ParseScoreEventFn()))
> 
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(60)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.standardSeconds(60))
> .discardingFiredPanes())
> .apply(MapElements.into(kvs(strings(), integers()))
> .via(scoreEvent -> KV.of(scoreEvent.getPlayer(), 
> scoreEvent.getScore(
> .apply(Count.perKey())
> .apply(ParDo.of(Log.of("counted per key")));
> {code}
> With this triggering, using both the flink local runner the direct runner, 
> panes will be fired after a long delay (minutes) for low frequencies of 
> messages in pubsub (seconds). The biggest issue is that it seems no panes 
> will ever be emitted if you just send a few events and stop. This is 
> particularly likely trip up people new to Beam.
> If I change the triggering to have early firings I get exactly the emitted 
> panes that you would expect.
> {code}
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(60)))
> .triggering(AfterWatermark.pastEndOfWindow()
> .withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
> .alignedTo(Duration.standardSeconds(60
> .withAllowedLateness(Duration.standardSeconds(60))
> .discardingFiredPanes())
> {code}
> I can 

[jira] [Work logged] (BEAM-7322) PubSubIO watermark does not advance for very low volumes

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7322?focusedWorklogId=246558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246558
 ]

ASF GitHub Bot logged work on BEAM-7322:


Author: ASF GitHub Bot
Created on: 22/May/19 03:22
Start Date: 22/May/19 03:22
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #8598: [BEAM-7322] Add 
threshold to PubSub unbounded source
URL: https://github.com/apache/beam/pull/8598#issuecomment-494637432
 
 
   @reuvenlax @slavachernyak are probably better than myself for this code 
review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246558)
Time Spent: 50m  (was: 40m)

> PubSubIO watermark does not advance for very low volumes
> 
>
> Key: BEAM-7322
> URL: https://issues.apache.org/jira/browse/BEAM-7322
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Tim Sell
>Priority: Minor
> Attachments: data.json
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I have identified an issue where the watermark does not advance when using 
> the beam PubSubIO when volumes are very low.
> I have created a mini example project to demonstrate the behaviour with a 
> python script for generating messages at different frequencies:
> https://github.com/tims/beam/tree/pubsub-watermark-example/pubsub-watermark 
> [note: this is in a directory of a Beam fork for corp hoop jumping 
> convenience on my end, it is not intended for merging].
> The behaviour is easily replicated if you apply a fixed window triggering 
> after the watermark passes the end of the window.
> {code}
> pipeline.apply(PubsubIO.readStrings().fromSubscription(subscription))
> .apply(ParDo.of(new ParseScoreEventFn()))
> 
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(60)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.standardSeconds(60))
> .discardingFiredPanes())
> .apply(MapElements.into(kvs(strings(), integers()))
> .via(scoreEvent -> KV.of(scoreEvent.getPlayer(), 
> scoreEvent.getScore(
> .apply(Count.perKey())
> .apply(ParDo.of(Log.of("counted per key")));
> {code}
> With this triggering, using both the flink local runner the direct runner, 
> panes will be fired after a long delay (minutes) for low frequencies of 
> messages in pubsub (seconds). The biggest issue is that it seems no panes 
> will ever be emitted if you just send a few events and stop. This is 
> particularly likely trip up people new to Beam.
> If I change the triggering to have early firings I get exactly the emitted 
> panes that you would expect.
> {code}
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(60)))
> .triggering(AfterWatermark.pastEndOfWindow()
> .withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()
> .alignedTo(Duration.standardSeconds(60
> .withAllowedLateness(Duration.standardSeconds(60))
> .discardingFiredPanes())
> {code}
> I can use any variation of early firing triggers and they work as expected.
> We believe that the watermark is not advancing when the volume is too low 
> because of the sampling that PubSubIO does to determine it's watermark. It 
> just never has a large enough sample. 
> This problem occurs in the direct runner and flink runner, but not in the 
> dataflow runner (because dataflow uses it's own PubSubIO because dataflow has 
> access to internal details of pubsub and so doesn't need to do any sampling).
> For extra context from the user@ list:
> *Kenneth Knowles:*
> Thanks to your info, I think it is the configuration of MovingFunction [1] 
> that is the likely culprit, but I don't totally understand why. It is 
> configured like so:
>  - store 60 seconds of data
>  - update data every 5 seconds
>  - require at least 10 messages to be 'significant'
>  - require messages from at least 2 distinct 5 second update periods to 
> 'significant'
> I would expect a rate of 1 message per second to satisfy this. I may have 
> read something wrong.
> Have you filed an issue in Jira [2]?
> Kenn
> [1] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L508
> [2] https://issues.apache.org/jira/projects/BEAM/issues
> *Alexey Romanenko:*
> Not sure that this can be very helpful but I recall a similar issue with 

[jira] [Created] (BEAM-7385) Portable Spark: testHotKeyCombiningWithAccumulationMode fails

2019-05-21 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7385:
-

 Summary: Portable Spark: testHotKeyCombiningWithAccumulationMode 
fails
 Key: BEAM-7385
 URL: https://issues.apache.org/jira/browse/BEAM-7385
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Expected: a collection containing <15> but: was empty

[https://github.com/apache/beam/blob/8403313ea7d63e49974629136c615e379ea874ce/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L761]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7376) upgrade tox version on jenkins jobs, fix google-cloud-datastore version range

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7376?focusedWorklogId=246496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246496
 ]

ASF GitHub Bot logged work on BEAM-7376:


Author: ASF GitHub Bot
Created on: 22/May/19 01:18
Start Date: 22/May/19 01:18
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #8631: [BEAM-7376] Update tox 
version used by gradle
URL: https://github.com/apache/beam/pull/8631#issuecomment-494615317
 
 
   It will be good to reference that JIRA as a TODO from this code, if that is 
possible.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246496)
Time Spent: 1.5h  (was: 1h 20m)

> upgrade tox version on jenkins jobs, fix google-cloud-datastore version range
> -
>
> Key: BEAM-7376
> URL: https://issues.apache.org/jira/browse/BEAM-7376
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Affects Versions: 2.13.0
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Blocker
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Minimum version: 3.4
> Otherwise these settings are silently ignored:
> https://github.com/apache/beam/blob/eddc83a33e74a606b0584eda75e4c2257e666032/sdks/python/tox.ini#L46-L52
> ref: https://tox.readthedocs.io/en/latest/config.html#conf-commands_pre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7354) Starcgen tool not working when no identifiers specified

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7354?focusedWorklogId=246497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246497
 ]

ASF GitHub Bot logged work on BEAM-7354:


Author: ASF GitHub Bot
Created on: 22/May/19 01:18
Start Date: 22/May/19 01:18
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #8611: [BEAM-7354] 
Starcgen fix when no identifiers specified.
URL: https://github.com/apache/beam/pull/8611
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246497)
Time Spent: 50m  (was: 40m)

> Starcgen tool not working when no identifiers specified
> ---
>
> Key: BEAM-7354
> URL: https://issues.apache.org/jira/browse/BEAM-7354
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Stumbled onto this bug, starcgen tool is currently used only with identifiers 
> specified so this was missed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6429) apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input fails in Python 3.6

2019-05-21 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-6429:
---

Assignee: Udi Meiri

> apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest.test_multimap_side_input
>  fails in Python 3.6 
> 
>
> Key: BEAM-6429
> URL: https://issues.apache.org/jira/browse/BEAM-6429
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {noformat}
> ERROR: test_multimap_side_input 
> (apache_beam.runners.portability.fn_api_runner_test.FnApiRunnerTest)
> --
> Traceback (most recent call last):
>  File 
> "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py", 
> line 230, in test_multimap_side_input
>  equal_to([('a', [1, 3]), ('b', [2])]))
>  File "/beam/sdks/python/apache_beam/pipeline.py", line 425, in __exit__
>  self.run().wait_until_finish()
>  File "/beam/sdks/python/apache_beam/pipeline.py", line 405, in run
>  self._options).run(False)
>  File "/beam/sdks/python/apache_beam/pipeline.py", line 418, in run
>  return self.runner.run_pipeline(self, self._options)
>  File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
> line 265, in run_pipeline
>  default_environment=self._default_environment))
>  File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
> line 268, in run_via_runner_api
>  return self.run_stages(*self.create_stages(pipeline_proto))
>  File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
> line 355, in run_stages
>  safe_coders)
>  File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
> line 449, in run_stage
>  elements_by_window = _WindowGroupingBuffer(si, value_coder)
>  File "/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", 
> line 191, in __init__
>  self._key_coder = coder.wrapped_value_coder.key_coder()
>  File "/beam/sdks/python/apache_beam/coders/coders.py", line 177, in key_coder
>  raise ValueError('Not a KV coder: %s.' % self)
> ValueError: Not a KV coder: BytesCoder.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7364) Possible signed overflow for WindowedValue.__hash__

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7364?focusedWorklogId=246491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246491
 ]

ASF GitHub Bot logged work on BEAM-7364:


Author: ASF GitHub Bot
Created on: 22/May/19 01:07
Start Date: 22/May/19 01:07
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8593: [BEAM-7364] Avoid 
possible signed integer overflow in hash.
URL: https://github.com/apache/beam/pull/8593
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246491)
Time Spent: 50m  (was: 40m)

> Possible signed overflow for WindowedValue.__hash__
> ---
>
> Key: BEAM-7364
> URL: https://issues.apache.org/jira/browse/BEAM-7364
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6951) Beam Dependency Update Request: com.github.spotbugs:spotbugs-annotations

2019-05-21 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845385#comment-16845385
 ] 

Kenneth Knowles commented on BEAM-6951:
---

Doesn't seem to make sense to upgrade to a beta, plus major version bump.

> Beam Dependency Update Request: com.github.spotbugs:spotbugs-annotations
> 
>
> Key: BEAM-6951
> URL: https://issues.apache.org/jira/browse/BEAM-6951
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>
>  - 2019-04-01 12:15:05.460427 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-08 12:15:37.305259 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-15 12:35:52.817108 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:13:25.261372 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-05-20 16:39:18.034675 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-05-20 16:54:09.180503 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-05-20 17:37:40.326607 
> -
> Please consider upgrading the dependency 
> com.github.spotbugs:spotbugs-annotations. 
> The current version is 3.1.11. The latest version is 4.0.0-beta1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-3040) Python precommit timed out after 150 minutes

2019-05-21 Thread Ahmet Altay (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-3040.
---
Resolution: Won't Fix

> Python precommit timed out after 150 minutes
> 
>
> Key: BEAM-3040
> URL: https://issues.apache.org/jira/browse/BEAM-3040
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/143/consoleFull
> Within about 10 minutes it reaches this point:
> {code}
> ...
> 2017-10-10T03:33:33.591 [INFO] --- findbugs-maven-plugin:3.0.4:check 
> (default) @ beam-sdks-python ---
> 2017-10-10T03:33:33.702 [INFO] 
> 2017-10-10T03:33:33.702 [INFO] --- exec-maven-plugin:1.5.0:exec 
> (setuptools-test) @ beam-sdks-python ---
> {code}
> and the final output is like this:
> {code}
> ...
> 2017-10-10T03:33:33.702 [INFO] --- exec-maven-plugin:1.5.0:exec 
> (setuptools-test) @ beam-sdks-python ---
> docs create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/docs
> GLOB sdist-make: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/setup.py
> lint create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/lint
> py27 create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27
> py27cython create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27cython
> py27cython installdeps: nose==1.3.7, grpcio-tools==1.3.5, cython==0.25.2
> docs installdeps: nose==1.3.7, grpcio-tools==1.3.5, Sphinx==1.5.5, 
> sphinx_rtd_theme==0.2.4
> lint installdeps: nose==1.3.7, pycodestyle==2.3.1, pylint==1.7.1
> py27 installdeps: nose==1.3.7, grpcio-tools==1.3.5
> lint inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27cython inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 runtests: PYTHONHASHSEED='2225684666'
> py27 runtests: commands[0] | python --version
> py27 runtests: commands[1] | - find apache_beam -type f -name *.pyc -delete
> py27 runtests: commands[2] | pip install -e .[test]
> lint runtests: PYTHONHASHSEED='2225684666'
> lint runtests: commands[0] | time pip install -e .[test]
> docs inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 runtests: commands[3] | python 
> apache_beam/examples/complete/autocomplete_test.py
> lint runtests: commands[1] | time 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/run_pylint.sh
> py27 runtests: commands[4] | python setup.py test
> docs runtests: PYTHONHASHSEED='2225684666'
> docs runtests: commands[0] | time pip install -e .[test,gcp,docs]
> docs runtests: commands[1] | time 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/generate_pydoc.sh
> py27gcp create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27gcp
> py27gcp installdeps: nose==1.3.7
> py27cython runtests: PYTHONHASHSEED='2225684666'
> py27cython runtests: commands[0] | python --version
> py27cython runtests: commands[1] | - find apache_beam -type f -name *.pyc 
> -delete
> py27cython runtests: commands[2] | - find apache_beam -type f -name *.c 
> -delete
> py27cython runtests: commands[3] | - find apache_beam -type f -name *.so 
> -delete
> py27cython runtests: commands[4] | - find target/build -type f -name *.c 
> -delete
> py27cython runtests: commands[5] | - find target/build -type f -name *.so 
> -delete
> py27cython runtests: commands[6] | time pip install -e .[test]
> py27gcp inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27gcp runtests: PYTHONHASHSEED='2225684666'
> py27gcp runtests: commands[0] | pip install -e .[test,gcp]
> py27gcp runtests: commands[1] | python --version
> py27gcp runtests: commands[2] | - find apache_beam -type f -name *.pyc -delete
> py27gcp runtests: commands[3] | python 
> apache_beam/examples/complete/autocomplete_test.py
> py27gcp runtests: commands[4] | python setup.py test
> py27cython runtests: commands[7] | python 
> apache_beam/examples/complete/autocomplete_test.py
> py27cython runtests: 

[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=246483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246483
 ]

ASF GitHub Bot logged work on BEAM-6138:


Author: ASF GitHub Bot
Created on: 22/May/19 00:52
Start Date: 22/May/19 00:52
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8416: [BEAM-6138] Add the 
Sampled Byte Count counters to the Java SDK
URL: https://github.com/apache/beam/pull/8416#issuecomment-494611085
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246483)
Time Spent: 16h 10m  (was: 16h)

> Add User Metric Support to Java SDK
> ---
>
> Key: BEAM-6138
> URL: https://issues.apache.org/jira/browse/BEAM-6138
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
> Fix For: 3.0.0
>
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3040) Python precommit timed out after 150 minutes

2019-05-21 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845382#comment-16845382
 ] 

Ahmet Altay commented on BEAM-3040:
---

I believe this obsolete now. Closing it, If a similar issue happens please 
re-open or file a new issue with comments related to the new issue.

> Python precommit timed out after 150 minutes
> 
>
> Key: BEAM-3040
> URL: https://issues.apache.org/jira/browse/BEAM-3040
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/143/consoleFull
> Within about 10 minutes it reaches this point:
> {code}
> ...
> 2017-10-10T03:33:33.591 [INFO] --- findbugs-maven-plugin:3.0.4:check 
> (default) @ beam-sdks-python ---
> 2017-10-10T03:33:33.702 [INFO] 
> 2017-10-10T03:33:33.702 [INFO] --- exec-maven-plugin:1.5.0:exec 
> (setuptools-test) @ beam-sdks-python ---
> {code}
> and the final output is like this:
> {code}
> ...
> 2017-10-10T03:33:33.702 [INFO] --- exec-maven-plugin:1.5.0:exec 
> (setuptools-test) @ beam-sdks-python ---
> docs create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/docs
> GLOB sdist-make: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/setup.py
> lint create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/lint
> py27 create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27
> py27cython create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27cython
> py27cython installdeps: nose==1.3.7, grpcio-tools==1.3.5, cython==0.25.2
> docs installdeps: nose==1.3.7, grpcio-tools==1.3.5, Sphinx==1.5.5, 
> sphinx_rtd_theme==0.2.4
> lint installdeps: nose==1.3.7, pycodestyle==2.3.1, pylint==1.7.1
> py27 installdeps: nose==1.3.7, grpcio-tools==1.3.5
> lint inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27cython inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 runtests: PYTHONHASHSEED='2225684666'
> py27 runtests: commands[0] | python --version
> py27 runtests: commands[1] | - find apache_beam -type f -name *.pyc -delete
> py27 runtests: commands[2] | pip install -e .[test]
> lint runtests: PYTHONHASHSEED='2225684666'
> lint runtests: commands[0] | time pip install -e .[test]
> docs inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 runtests: commands[3] | python 
> apache_beam/examples/complete/autocomplete_test.py
> lint runtests: commands[1] | time 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/run_pylint.sh
> py27 runtests: commands[4] | python setup.py test
> docs runtests: PYTHONHASHSEED='2225684666'
> docs runtests: commands[0] | time pip install -e .[test,gcp,docs]
> docs runtests: commands[1] | time 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/generate_pydoc.sh
> py27gcp create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27gcp
> py27gcp installdeps: nose==1.3.7
> py27cython runtests: PYTHONHASHSEED='2225684666'
> py27cython runtests: commands[0] | python --version
> py27cython runtests: commands[1] | - find apache_beam -type f -name *.pyc 
> -delete
> py27cython runtests: commands[2] | - find apache_beam -type f -name *.c 
> -delete
> py27cython runtests: commands[3] | - find apache_beam -type f -name *.so 
> -delete
> py27cython runtests: commands[4] | - find target/build -type f -name *.c 
> -delete
> py27cython runtests: commands[5] | - find target/build -type f -name *.so 
> -delete
> py27cython runtests: commands[6] | time pip install -e .[test]
> py27gcp inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27gcp runtests: PYTHONHASHSEED='2225684666'
> py27gcp runtests: commands[0] | pip install -e .[test,gcp]
> py27gcp runtests: commands[1] | python --version
> py27gcp runtests: commands[2] | - find apache_beam -type f -name *.pyc -delete
> py27gcp runtests: commands[3] | python 
> apache_beam/examples/complete/autocomplete_test.py
> py27gcp runtests: 

[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=246480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246480
 ]

ASF GitHub Bot logged work on BEAM-6138:


Author: ASF GitHub Bot
Created on: 22/May/19 00:51
Start Date: 22/May/19 00:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8416: [BEAM-6138] Add the 
Sampled Byte Count counters to the Java SDK
URL: https://github.com/apache/beam/pull/8416#issuecomment-494610869
 
 
   it looks like javaportabilityapi may be broken
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246480)
Time Spent: 15h 40m  (was: 15.5h)

> Add User Metric Support to Java SDK
> ---
>
> Key: BEAM-6138
> URL: https://issues.apache.org/jira/browse/BEAM-6138
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
> Fix For: 3.0.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=246482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246482
 ]

ASF GitHub Bot logged work on BEAM-6138:


Author: ASF GitHub Bot
Created on: 22/May/19 00:51
Start Date: 22/May/19 00:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8416: [BEAM-6138] Add the 
Sampled Byte Count counters to the Java SDK
URL: https://github.com/apache/beam/pull/8416#issuecomment-494610925
 
 
   Run JavaPortabilityApi PreCommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246482)
Time Spent: 16h  (was: 15h 50m)

> Add User Metric Support to Java SDK
> ---
>
> Key: BEAM-6138
> URL: https://issues.apache.org/jira/browse/BEAM-6138
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
> Fix For: 3.0.0
>
>  Time Spent: 16h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6138) Add User Metric Support to Java SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6138?focusedWorklogId=246481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246481
 ]

ASF GitHub Bot logged work on BEAM-6138:


Author: ASF GitHub Bot
Created on: 22/May/19 00:51
Start Date: 22/May/19 00:51
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8416: [BEAM-6138] Add the 
Sampled Byte Count counters to the Java SDK
URL: https://github.com/apache/beam/pull/8416#issuecomment-494610912
 
 
   unfortunately, it runs without --scan.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246481)
Time Spent: 15h 50m  (was: 15h 40m)

> Add User Metric Support to Java SDK
> ---
>
> Key: BEAM-6138
> URL: https://issues.apache.org/jira/browse/BEAM-6138
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Alex Amato
>Priority: Major
> Fix For: 3.0.0
>
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7246) Create a Spanner IO for Python

2019-05-21 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845380#comment-16845380
 ] 

Ahmet Altay commented on BEAM-7246:
---

cc: [~chamikara]

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7364) Possible signed overflow for WindowedValue.__hash__

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7364?focusedWorklogId=246472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246472
 ]

ASF GitHub Bot logged work on BEAM-7364:


Author: ASF GitHub Bot
Created on: 22/May/19 00:36
Start Date: 22/May/19 00:36
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8593: [BEAM-7364] Avoid 
possible signed integer overflow in hash.
URL: https://github.com/apache/beam/pull/8593#issuecomment-494608498
 
 
   Run Python_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246472)
Time Spent: 40m  (was: 0.5h)

> Possible signed overflow for WindowedValue.__hash__
> ---
>
> Key: BEAM-7364
> URL: https://issues.apache.org/jira/browse/BEAM-7364
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4543) Remove dependency on googledatastore in favor of google-cloud-datastore.

2019-05-21 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845370#comment-16845370
 ] 

Udi Meiri commented on BEAM-4543:
-

I fixed the dependency warning in https://github.com/apache/beam/pull/8631


> Remove dependency on googledatastore in favor of google-cloud-datastore.
> 
>
> Key: BEAM-4543
> URL: https://issues.apache.org/jira/browse/BEAM-4543
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Minor
> Fix For: 2.13.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> apache-beam[gcp] package depends [1] on googledatastore package [2]. We 
> should replace this dependency with google-cloud-datastore [3] which is 
> officially supported, has better release cadence and also has Python 3 
> support.
> [1] 
> https://github.com/apache/beam/blob/fad655462f8fadfdfaab0b7a09cab538f076f94e/sdks/python/setup.py#L126
> [2] [https://pypi.org/project/googledatastore/]
> [3] [https://pypi.org/project/google-cloud-datastore/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7183) Python 3.6 IT tests: The Dataflow job appears to be stuck

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-7183.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Python 3.6 IT tests: The Dataflow job appears to be stuck
> -
>
> Key: BEAM-7183
> URL: https://issues.apache.org/jira/browse/BEAM-7183
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Juta Staes
>Priority: Minor
> Fix For: Not applicable
>
>
> Several test fail in the 
> beam-sdks-python-test-suites-dataflow-py36:postCommitIT with the following 
> error
> {code:java}
>  19:13:05 
> ==
> 19:13:05 ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> 19:13:05 
> --
> 19:13:05 Traceback (most recent call last):
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 51, in test_wordcount_fnapi_it
> 19:13:05 self._run_wordcount_it(wordcount.run, experiment='beam_fn_api')
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 75, in _run_wordcount_it
> 19:13:05 
> run_wordcount(test_pipeline.get_full_options_as_args(**extra_opts))
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 114, in run
> 19:13:05 result = p.run()
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 19:13:05 self._options).run(False)
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 19:13:05 return self.runner.run_pipeline(self, self._options)
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 64, in run_pipeline
> 19:13:05 self.result.wait_until_finish(duration=wait_duration)
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1240, in wait_until_finish
> 19:13:05 (self.state, getattr(self._runner, 'last_error_msg', None)), 
> self)
> 19:13:05 
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:
> 19:13:05 Workflow failed. Causes: The Dataflow job appears to be stuck 
> because no worker activity has been seen in the last 1h. You can get help 
> with Cloud Dataflow at https://cloud.google.com/dataflow/support.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7183) Python 3.6 IT tests: The Dataflow job appears to be stuck

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-7183:
-

Assignee: Valentyn Tymofieiev

> Python 3.6 IT tests: The Dataflow job appears to be stuck
> -
>
> Key: BEAM-7183
> URL: https://issues.apache.org/jira/browse/BEAM-7183
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: Not applicable
>
>
> Several test fail in the 
> beam-sdks-python-test-suites-dataflow-py36:postCommitIT with the following 
> error
> {code:java}
>  19:13:05 
> ==
> 19:13:05 ERROR: test_wordcount_fnapi_it 
> (apache_beam.examples.wordcount_it_test.WordCountIT)
> 19:13:05 
> --
> 19:13:05 Traceback (most recent call last):
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 51, in test_wordcount_fnapi_it
> 19:13:05 self._run_wordcount_it(wordcount.run, experiment='beam_fn_api')
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
>  line 75, in _run_wordcount_it
> 19:13:05 
> run_wordcount(test_pipeline.get_full_options_as_args(**extra_opts))
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/wordcount.py",
>  line 114, in run
> 19:13:05 result = p.run()
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 19:13:05 self._options).run(False)
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 19:13:05 return self.runner.run_pipeline(self, self._options)
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 64, in run_pipeline
> 19:13:05 self.result.wait_until_finish(duration=wait_duration)
> 19:13:05   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1240, in wait_until_finish
> 19:13:05 (self.state, getattr(self._runner, 'last_error_msg', None)), 
> self)
> 19:13:05 
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:
> 19:13:05 Workflow failed. Causes: The Dataflow job appears to be stuck 
> because no worker activity has been seen in the last 1h. You can get help 
> with Cloud Dataflow at https://cloud.google.com/dataflow/support.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7181) Python 3.6 IT tests: PubSub Expected 2 messages. Got 0 messages.

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-7181.
---
   Resolution: Fixed
Fix Version/s: Not applicable

> Python 3.6 IT tests: PubSub Expected 2 messages. Got 0 messages.
> 
>
> Key: BEAM-7181
> URL: https://issues.apache.org/jira/browse/BEAM-7181
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Several test fail in the 
> beam-sdks-python-test-suites-dataflow-py36:postCommitIT with the following 
> error
> {code:java}
> 19:13:05 
> ==
>  19:13:05 FAIL: test_streaming_data_only 
> (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
>  19:13:05 
> --
>  19:13:05 Traceback (most recent call last):
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 175, in test_streaming_data_only
>  19:13:05 self._test_streaming(with_attributes=False)
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 171, in _test_streaming
>  19:13:05 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>  line 91, in run_pipeline
>  19:13:05 result = p.run()
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
>  19:13:05 return self.runner.run_pipeline(self, self._options)
>  19:13:05 File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 68, in run_pipeline
>  19:13:05 hc_assert_that(self.result, pickler.loads(on_success_matcher))
>  19:13:05 AssertionError: 
>  19:13:05 Expected: (Test pipeline expected terminated in state: RUNNING and 
> Expected 2 messages.)
>  19:13:05 but: Expected 2 messages. Got 0 messages. Diffs (item, count):
>  19:13:05 Expected but not in actual: dict_items([('data001-seen', 1), 
> ('data002-seen', 1)])
>  19:13:05 Unexpected: dict_items([]){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-7182) Python 3.6 IT tests: Table was not found in location US

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-7182.
---
   Resolution: Fixed
 Assignee: Valentyn Tymofieiev
Fix Version/s: Not applicable

> Python 3.6 IT tests: Table was not found in location US
> ---
>
> Key: BEAM-7182
> URL: https://issues.apache.org/jira/browse/BEAM-7182
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: Not applicable
>
>
> Several test fail in the 
> beam-sdks-python-test-suites-dataflow-py36:postCommitIT with the following 
> error
> {code:java}
>  19:13:04 
> ==
> 19:13:04 ERROR: test_leader_board_it 
> (apache_beam.examples.complete.game.leader_board_it_test.LeaderBoardIT)
> 19:13:04 
> --
> 19:13:04 Traceback (most recent call last):
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/complete/game/leader_board_it_test.py",
>  line 152, in test_leader_board_it
> 19:13:04 self.test_pipeline.get_full_options_as_args(**extra_opts))
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/examples/complete/game/leader_board.py",
>  line 348, in run
> 19:13:04 }, options.view_as(GoogleCloudOptions).project))
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__
> 19:13:04 self.run().wait_until_finish()
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 19:13:04 self._options).run(False)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 19:13:04 return self.runner.run_pipeline(self, self._options)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 68, in run_pipeline
> 19:13:04 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1709362673/lib/python3.6/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> 19:13:04 _assert_match(actual=arg1, matcher=arg2, reason=arg3)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1709362673/lib/python3.6/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> 19:13:04 if not matcher.matches(actual):
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1709362673/lib/python3.6/site-packages/hamcrest/core/core/allof.py",
>  line 16, in matches
> 19:13:04 if not matcher.matches(item):
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1709362673/lib/python3.6/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> 19:13:04 match_result = self._matches(item)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py",
>  line 81, in _matches
> 19:13:04 response = self._query_with_retry(bigquery_client)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
>  line 208, in wrapper
> 19:13:04 raise_with_traceback(exn, exn_traceback)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/build/gradleenv/-1709362673/lib/python3.6/site-packages/future/utils/__init__.py",
>  line 419, in raise_with_traceback
> 19:13:04 raise exc.with_traceback(traceback)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/utils/retry.py",
>  line 195, in wrapper
> 19:13:04 return fun(*args, **kwargs)
> 19:13:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/tests/bigquery_matcher.py",
>  line 98, in _query_with_retry
> 19:13:04 return [row.values() for row in query_job]
> 19:13:04   File 
> 

[jira] [Work logged] (BEAM-7383) Add flag enabling vet runner verification for Universal and Direct runners.

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7383?focusedWorklogId=246464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246464
 ]

ASF GitHub Bot logged work on BEAM-7383:


Author: ASF GitHub Bot
Created on: 21/May/19 23:57
Start Date: 21/May/19 23:57
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #8644: [BEAM-7383] 
Adding strict flag to runners to validate with vet runner
URL: https://github.com/apache/beam/pull/8644
 
 
   Creates a pipeline option called beam_strict that's supported by the
   direct and universal runner. Uses the vet runner to perform the
   verification for strict mode.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | ---
   
   Pre-Commit Tests Status (on master branch)
   

[jira] [Work logged] (BEAM-7383) Add flag enabling vet runner verification for Universal and Direct runners.

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7383?focusedWorklogId=246465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246465
 ]

ASF GitHub Bot logged work on BEAM-7383:


Author: ASF GitHub Bot
Created on: 21/May/19 23:57
Start Date: 21/May/19 23:57
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #8644: [BEAM-7383] Adding 
strict flag to runners to validate with vet runner
URL: https://github.com/apache/beam/pull/8644#issuecomment-494601728
 
 
   R: @lostluck 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246465)
Time Spent: 20m  (was: 10m)

> Add flag enabling vet runner verification for Universal and Direct runners.
> ---
>
> Key: BEAM-7383
> URL: https://issues.apache.org/jira/browse/BEAM-7383
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> With the Vet Runner added, add the ability to use it to verify the user's 
> pipeline while using the direct or universal runner by enabling some kind of 
> flag or option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6769) BigQuery IO does not support bytes in Python 3

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6769?focusedWorklogId=246463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246463
 ]

ASF GitHub Bot logged work on BEAM-6769:


Author: ASF GitHub Bot
Created on: 21/May/19 23:56
Start Date: 21/May/19 23:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8621: 
[BEAM-6769][BEAM-7327] add it test for writing and reading with bigqu…
URL: https://github.com/apache/beam/pull/8621#discussion_r286267863
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -654,12 +654,17 @@ def apply_WriteToBigQuery(self, transform, pcoll, 
options):
   return self.apply_PTransform(transform, pcoll, options)
 else:
   from apache_beam.io.gcp.bigquery_tools import 
parse_table_schema_from_json
+  if transform.schema == beam.io.gcp.bigquery.SCHEMA_AUTODETECT \
+  or transform.schema is None:
+schema = transform.schema
 
 Review comment:
   actually, I've added a comment about this here: 
https://issues.apache.org/jira/browse/BEAM-7382
   
   If we have autodetection while using the BigQuerySink, we should error out, 
as it is not supported.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246463)
Time Spent: 15h 10m  (was: 15h)

> BigQuery IO does not support bytes in Python 3
> --
>
> Key: BEAM-6769
> URL: https://issues.apache.org/jira/browse/BEAM-6769
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Juta Staes
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> In Python 2 you could write bytes data to BigQuery. This is tested in
>  
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L186]
> Python 3 does not support
> {noformat}
> json.dumps({'test': b'test'}){noformat}
> which is used to encode the data in
>  
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L959]
>  
> How should writing bytes to BigQuery be handled in Python 3?
>  * Forbid writing bytes into BigQuery on Python 3
>  * Guess the encoding (utf-8?)
>  * Pass the encoding to BigQuery
> cc: [~tvalentyn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7382) Bigquery IO: schema autodetection failing

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845362#comment-16845362
 ] 

Valentyn Tymofieiev commented on BEAM-7382:
---

If it's a known limitation in the native sink that we don't intend to support, 
we can error-out.

> Bigquery IO: schema autodetection failing
> -
>
> Key: BEAM-7382
> URL: https://issues.apache.org/jira/browse/BEAM-7382
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Pablo Estrada
>Priority: Major
>
> I am working on writing it tests for bigquery io on the dataflowrunner.
> When testing the schema auto detection I get:
> {code:java}
> ERROR: test_big_query_write_schema_autodetect 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)*12:41:01*
>  
> --*12:41:01*
>  Traceback (most recent call last):*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 156, in test_big_query_write_schema_autodetect*12:41:01* 
> write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__*12:41:01* self.run().wait_until_finish()*12:41:01* 
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run*12:41:01* return self.runner.run_pipeline(self, 
> self._options)*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 64, in run_pipeline*12:41:01* 
> self.result.wait_until_finish(duration=wait_duration)*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1322, in wait_until_finish*12:41:01* (self.state, 
> getattr(self._runner, 'last_error_msg', None)), self)*12:41:01* 
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:*12:41:01* Workflow failed. 
> Causes: S01:create/Read+write/WriteToBigQuery/NativeWrite failed., BigQuery 
> import job "dataflow_job_18059625072014532771-B" failed., BigQuery job 
> "dataflow_job_18059625072014532771-B" in project "apache-beam-testing" 
> finished with error(s): errorResult: No schema specified on job or table., 
> error: No schema specified on job or table.
> {code}
> test code:
> {code:java}
> input_data = [
> {'number': 1, 'str': 'abc'},
> {'number': 2, 'str': 'def'},
> ]
> with beam.Pipeline(argv=args) as p:
>   (p | 'create' >> beam.Create(input_data)
>| 'write' >> beam.io.WriteToBigQuery(
>output_table,
>schema=beam.io.gcp.bigquery.SCHEMA_AUTODETECT,
>create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> {code}
> Is there something wrong with my test or is this a bug?
> link to pr: [https://github.com/apache/beam/pull/8621]
> cc: [~tvalentyn] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=246460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246460
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 21/May/19 23:47
Start Date: 21/May/19 23:47
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #8535: [BEAM-6693] 
ApproximateUnique transform for Python SDK
URL: https://github.com/apache/beam/pull/8535#issuecomment-494599647
 
 
   Added some comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246460)
Time Spent: 9.5h  (was: 9h 20m)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=246459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246459
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 21/May/19 23:46
Start Date: 21/May/19 23:46
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #8535: [BEAM-6693] 
ApproximateUnique transform for Python SDK
URL: https://github.com/apache/beam/pull/8535#discussion_r286265782
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats.py
 ##
 @@ -0,0 +1,215 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""This module has all statistic related transforms."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+import heapq
+import math
+import sys
+from builtins import round
+
+import mmh3
+
+from apache_beam import coders
+from apache_beam.transforms.core import *
+from apache_beam.transforms.ptransform import PTransform
+
+__all__ = [
+'ApproximateUniqueGlobally',
+'ApproximateUniquePerKey',
+]
+
+
+class ApproximateUniqueGlobally(PTransform):
 
 Review comment:
   We can add type annotation to the `PTransform`s here using 
`@with_input_type` and `@with_output_type`.
   
   See 
https://github.com/apache/beam/blob/a71f305402efe050c9dcf5ef305141a66efb2953/sdks/python/apache_beam/transforms/core.py#L1788
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246459)
Time Spent: 9h 20m  (was: 9h 10m)

> ApproximateUnique transform for Python SDK
> --
>
> Key: BEAM-6693
> URL: https://issues.apache.org/jira/browse/BEAM-6693
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Minor
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Add a PTransform for estimating the number of distinct elements in a 
> PCollection and the number of distinct values associated with each key in a 
> PCollection KVs.
> it should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6693) ApproximateUnique transform for Python SDK

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6693?focusedWorklogId=246458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246458
 ]

ASF GitHub Bot logged work on BEAM-6693:


Author: ASF GitHub Bot
Created on: 21/May/19 23:40
Start Date: 21/May/19 23:40
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #8535: [BEAM-6693] 
ApproximateUnique transform for Python SDK
URL: https://github.com/apache/beam/pull/8535#discussion_r286264728
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats.py
 ##
 @@ -0,0 +1,215 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""This module has all statistic related transforms."""
+
+from __future__ import absolute_import
+from __future__ import division
+
+import heapq
+import math
+import sys
+from builtins import round
+
+import mmh3
+
+from apache_beam import coders
+from apache_beam.transforms.core import *
+from apache_beam.transforms.ptransform import PTransform
+
+__all__ = [
+'ApproximateUniqueGlobally',
+'ApproximateUniquePerKey',
+]
+
+
+class ApproximateUniqueGlobally(PTransform):
+  """
+  Hashes input elements and uses those to extrapolate the size of the entire
+  set of hash values by assuming the rest of the hash values are as densely
+  distributed as the sample space.
+
+  Args:
+**kwargs: Accepts a single named argument "size" or "error".
+size: an int not smaller than 16, which we would use to estimate
+  number of unique values.
+error: max estimation error, which is a float between 0.01
+  and 0.50. If error is given, size will be calculated from error with
+  _get_sample_size_from_est_error function.
+  """
+
+  _NO_VALUE_ERR_MSG = 'Either size or error should be set. Received {}.'
+  _MULTI_VALUE_ERR_MSG = 'Either size or error should be set. ' \
+ 'Received {size = %s, error = %s}.'
+  _INPUT_SIZE_ERR_MSG = 'ApproximateUnique needs a size >= 16 for an error ' \
+'<= 0.50. In general, the estimation error is about ' \
+'2 / sqrt(sample_size). Received {size = %s}.'
+  _INPUT_ERROR_ERR_MSG = 'ApproximateUnique needs an estimation error ' \
+ 'between 0.01 and 0.50. Received {error = %s}.'
+
+  def __init__(self, size=None, error=None):
+
+if None not in (size, error):
+  raise ValueError(self._MULTI_VALUE_ERR_MSG % (size, error))
+elif size is None and error is None:
+  raise ValueError(self._NO_VALUE_ERR_MSG)
+elif size is not None:
+  if not isinstance(size, int) or size < 16:
+raise ValueError(self._INPUT_SIZE_ERR_MSG % (size))
+  else:
+self._sample_size = size
+self._max_est_err = None
+else:
+  if error < 0.01 or error > 0.5:
+raise ValueError(self._INPUT_ERROR_ERR_MSG % (error))
+  else:
+self._sample_size = self._get_sample_size_from_est_error(error)
+self._max_est_err = error
+
+  def expand(self, pcoll):
+coder = coders.registry.get_coder(pcoll)
+return pcoll \
+   | 'CountGlobalUniqueValues' \
+   >> (CombineGlobally(ApproximateUniqueCombineFn(self._sample_size,
+  coder)))
+
+  @staticmethod
+  def _get_sample_size_from_est_error(est_err):
+"""
+:return: sample size
+
+Calculate sample size from estimation error
+"""
+# math.ceil in python 2.7 returns float, while it returns int in python 3.
+return int(math.ceil(4.0 / math.pow(est_err, 2.0)))
+
+
+class ApproximateUniquePerKey(ApproximateUniqueGlobally):
 
 Review comment:
   Sharing code by letting `ApproximateUniquePerKey` to extend 
`ApproximateUniqueGlobally` seems a bit weird to me. How about we create a 
wrapper class `ApproximateUnique` and make `PerKey` and `Globally` two inner 
classes, as what we usually do (see 
https://github.com/apache/beam/blob/a71f305402efe050c9dcf5ef305141a66efb2953/sdks/python/apache_beam/transforms/combiners.py#L112)?
   
   We can abstract the input checking logic into a static method and call them 
from both constructors.
 

[jira] [Commented] (BEAM-6445) Improve Release Process

2019-05-21 Thread Sam Rohde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845350#comment-16845350
 ] 

Sam Rohde commented on BEAM-6445:
-

Thanks for the ping

> Improve Release Process
> ---
>
> Key: BEAM-6445
> URL: https://issues.apache.org/jira/browse/BEAM-6445
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This JIRA tracks the improvement of the Beam release process as [discussed in 
> the dev 
> list|https://lists.apache.org/thread.html/d52ffbfca21eee953a230100520bd56d947a359c0029d5c291b736a7@%3Cdev.beam.apache.org%3E].
>  In summary, this change will hopefully increase the greenness of the build 
> by: increasing coverage, adding pre and post commits to release validation, 
> and adding a regular cadence to look at flaky and backlogged tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-6445) Improve Release Process

2019-05-21 Thread Sam Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde resolved BEAM-6445.
-
Resolution: Fixed

> Improve Release Process
> ---
>
> Key: BEAM-6445
> URL: https://issues.apache.org/jira/browse/BEAM-6445
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This JIRA tracks the improvement of the Beam release process as [discussed in 
> the dev 
> list|https://lists.apache.org/thread.html/d52ffbfca21eee953a230100520bd56d947a359c0029d5c291b736a7@%3Cdev.beam.apache.org%3E].
>  In summary, this change will hopefully increase the greenness of the build 
> by: increasing coverage, adding pre and post commits to release validation, 
> and adding a regular cadence to look at flaky and backlogged tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7382) Bigquery IO: schema autodetection failing

2019-05-21 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845349#comment-16845349
 ] 

Pablo Estrada commented on BEAM-7382:
-

[~Juta] - I see that this is using the native sink. The native sink does not 
support schema autodetection. What we can do in this case, is simply error out 
if we're going to use autodetection on the native sink (Juta you've already 
added logic to dataflow_runner.py to catch this. We'd just have to error out).

Thoughts? [~Juta][~tvalentyn]

> Bigquery IO: schema autodetection failing
> -
>
> Key: BEAM-7382
> URL: https://issues.apache.org/jira/browse/BEAM-7382
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Juta Staes
>Assignee: Pablo Estrada
>Priority: Major
>
> I am working on writing it tests for bigquery io on the dataflowrunner.
> When testing the schema auto detection I get:
> {code:java}
> ERROR: test_big_query_write_schema_autodetect 
> (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)*12:41:01*
>  
> --*12:41:01*
>  Traceback (most recent call last):*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py",
>  line 156, in test_big_query_write_schema_autodetect*12:41:01* 
> write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 426, in __exit__*12:41:01* self.run().wait_until_finish()*12:41:01* 
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 419, in run*12:41:01* return self.runner.run_pipeline(self, 
> self._options)*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
>  line 64, in run_pipeline*12:41:01* 
> self.result.wait_until_finish(duration=wait_duration)*12:41:01*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1322, in wait_until_finish*12:41:01* (self.state, 
> getattr(self._runner, 'last_error_msg', None)), self)*12:41:01* 
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:*12:41:01* Workflow failed. 
> Causes: S01:create/Read+write/WriteToBigQuery/NativeWrite failed., BigQuery 
> import job "dataflow_job_18059625072014532771-B" failed., BigQuery job 
> "dataflow_job_18059625072014532771-B" in project "apache-beam-testing" 
> finished with error(s): errorResult: No schema specified on job or table., 
> error: No schema specified on job or table.
> {code}
> test code:
> {code:java}
> input_data = [
> {'number': 1, 'str': 'abc'},
> {'number': 2, 'str': 'def'},
> ]
> with beam.Pipeline(argv=args) as p:
>   (p | 'create' >> beam.Create(input_data)
>| 'write' >> beam.io.WriteToBigQuery(
>output_table,
>schema=beam.io.gcp.bigquery.SCHEMA_AUTODETECT,
>create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
>write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
> {code}
> Is there something wrong with my test or is this a bug?
> link to pr: [https://github.com/apache/beam/pull/8621]
> cc: [~tvalentyn] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6284?focusedWorklogId=246447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246447
 ]

ASF GitHub Bot logged work on BEAM-6284:


Author: ASF GitHub Bot
Created on: 21/May/19 23:11
Start Date: 21/May/19 23:11
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8629: [BEAM-6284] 
Improve error message on waitUntilFinish.
URL: https://github.com/apache/beam/pull/8629#discussion_r286259125
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
 ##
 @@ -261,96 +275,123 @@ State waitUntilFinish(
   MonitoringUtil monitor)
   throws IOException, InterruptedException {
 
-BackOff backoff;
-if (!duration.isLongerThan(Duration.ZERO)) {
-  backoff = 
BackOffAdapter.toGcpBackOff(MESSAGES_BACKOFF_FACTORY.backoff());
-} else {
-  backoff =
-  BackOffAdapter.toGcpBackOff(
-  
MESSAGES_BACKOFF_FACTORY.withMaxCumulativeBackoff(duration).backoff());
-}
+BackOff backoff = getBackoff(duration, MESSAGES_BACKOFF_FACTORY);
 
 // This function tracks the cumulative time from the *first request* to 
enforce the wall-clock
 // limit. Any backoff instance could, at best, track the the time since 
the first attempt at a
 // given request. Thus, we need to track the cumulative time ourselves.
 long startNanos = nanoClock.nanoTime();
 
-State state;
+State state = State.UNKNOWN;
+Exception exception;
 do {
-  // Get the state of the job before listing messages. This ensures we 
always fetch job
-  // messages after the job finishes to ensure we have all them.
-  state =
-  getStateWithRetries(
-  
BackOffAdapter.toGcpBackOff(STATUS_BACKOFF_FACTORY.withMaxRetries(0).backoff()),
-  sleeper);
-  boolean hasError = state == State.UNKNOWN;
-
-  if (messageHandler != null && !hasError) {
-// Process all the job messages that have accumulated so far.
-try {
-  List allMessages = monitor.getJobMessages(getJobId(), 
lastTimestamp);
-
-  if (!allMessages.isEmpty()) {
-lastTimestamp =
-fromCloudTime(allMessages.get(allMessages.size() - 
1).getTime()).getMillis();
-messageHandler.process(allMessages);
-  }
-} catch (GoogleJsonResponseException | SocketTimeoutException e) {
-  hasError = true;
-  LOG.warn("There were problems getting current job messages: {}.", 
e.getMessage());
-  LOG.debug("Exception information:", e);
-}
+  exception = null;
+  try {
+// Get the state of the job before listing messages. This ensures we 
always fetch job
+// messages after the job finishes to ensure we have all them.
+state =
+getStateWithRetries(
+
BackOffAdapter.toGcpBackOff(STATUS_BACKOFF_FACTORY.withMaxRetries(0).backoff()),
+sleeper);
+  } catch (IOException e) {
+exception = e;
+LOG.warn("Failed to get job state: {}", e.getMessage());
+LOG.debug("Failed to get job state: {}", e);
+continue;
   }
 
-  if (!hasError) {
-// We can stop if the job is done.
-if (state.isTerminal()) {
-  switch (state) {
-case DONE:
-case CANCELLED:
-  LOG.info("Job {} finished with status {}.", getJobId(), state);
-  break;
-case UPDATED:
-  LOG.info(
-  "Job {} has been updated and is running as the new job with 
id {}. "
-  + "To access the updated job on the Dataflow monitoring 
console, "
-  + "please navigate to {}",
-  getJobId(),
-  getReplacedByJob().getJobId(),
-  MonitoringUtil.getJobMonitoringPageURL(
-  getReplacedByJob().getProjectId(),
-  getRegion(),
-  getReplacedByJob().getJobId()));
-  break;
-default:
-  LOG.info("Job {} failed with status {}.", getJobId(), state);
-  }
-  return state;
-}
+  exception = processJobMessages(messageHandler, monitor);
+
+  if (exception != null) {
 
 Review comment:
   In this case the logic seems right. I would probably try to organize the 
body of the loop to emphasize the flow though, something along the lines of:
   
   ```
   Optional state = tryGetState(); 
   if (!state.isPresent() || !tryProcessJobMessages()) {
 continue;
   }
   
   if (state.get().isTerminal()) {
 return state.get();
   }
   
   resetAttemptsCount();
   ```
   
   Hope this makes sense
 

This is an 

[jira] [Work logged] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6284?focusedWorklogId=246444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246444
 ]

ASF GitHub Bot logged work on BEAM-6284:


Author: ASF GitHub Bot
Created on: 21/May/19 23:04
Start Date: 21/May/19 23:04
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8629: [BEAM-6284] 
Improve error message on waitUntilFinish.
URL: https://github.com/apache/beam/pull/8629#discussion_r286257535
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java
 ##
 @@ -261,96 +275,123 @@ State waitUntilFinish(
   MonitoringUtil monitor)
   throws IOException, InterruptedException {
 
-BackOff backoff;
-if (!duration.isLongerThan(Duration.ZERO)) {
-  backoff = 
BackOffAdapter.toGcpBackOff(MESSAGES_BACKOFF_FACTORY.backoff());
-} else {
-  backoff =
-  BackOffAdapter.toGcpBackOff(
-  
MESSAGES_BACKOFF_FACTORY.withMaxCumulativeBackoff(duration).backoff());
-}
+BackOff backoff = getBackoff(duration, MESSAGES_BACKOFF_FACTORY);
 
 // This function tracks the cumulative time from the *first request* to 
enforce the wall-clock
 // limit. Any backoff instance could, at best, track the the time since 
the first attempt at a
 // given request. Thus, we need to track the cumulative time ourselves.
 long startNanos = nanoClock.nanoTime();
 
-State state;
+State state = State.UNKNOWN;
+Exception exception;
 do {
-  // Get the state of the job before listing messages. This ensures we 
always fetch job
-  // messages after the job finishes to ensure we have all them.
-  state =
-  getStateWithRetries(
-  
BackOffAdapter.toGcpBackOff(STATUS_BACKOFF_FACTORY.withMaxRetries(0).backoff()),
-  sleeper);
-  boolean hasError = state == State.UNKNOWN;
-
-  if (messageHandler != null && !hasError) {
-// Process all the job messages that have accumulated so far.
-try {
-  List allMessages = monitor.getJobMessages(getJobId(), 
lastTimestamp);
-
-  if (!allMessages.isEmpty()) {
-lastTimestamp =
-fromCloudTime(allMessages.get(allMessages.size() - 
1).getTime()).getMillis();
-messageHandler.process(allMessages);
-  }
-} catch (GoogleJsonResponseException | SocketTimeoutException e) {
-  hasError = true;
-  LOG.warn("There were problems getting current job messages: {}.", 
e.getMessage());
-  LOG.debug("Exception information:", e);
-}
+  exception = null;
+  try {
+// Get the state of the job before listing messages. This ensures we 
always fetch job
+// messages after the job finishes to ensure we have all them.
+state =
+getStateWithRetries(
+
BackOffAdapter.toGcpBackOff(STATUS_BACKOFF_FACTORY.withMaxRetries(0).backoff()),
+sleeper);
+  } catch (IOException e) {
+exception = e;
+LOG.warn("Failed to get job state: {}", e.getMessage());
+LOG.debug("Failed to get job state: {}", e);
+continue;
   }
 
-  if (!hasError) {
-// We can stop if the job is done.
-if (state.isTerminal()) {
-  switch (state) {
-case DONE:
-case CANCELLED:
-  LOG.info("Job {} finished with status {}.", getJobId(), state);
-  break;
-case UPDATED:
-  LOG.info(
-  "Job {} has been updated and is running as the new job with 
id {}. "
-  + "To access the updated job on the Dataflow monitoring 
console, "
-  + "please navigate to {}",
-  getJobId(),
-  getReplacedByJob().getJobId(),
-  MonitoringUtil.getJobMonitoringPageURL(
-  getReplacedByJob().getProjectId(),
-  getRegion(),
-  getReplacedByJob().getJobId()));
-  break;
-default:
-  LOG.info("Job {} failed with status {}.", getJobId(), state);
-  }
-  return state;
-}
+  exception = processJobMessages(messageHandler, monitor);
+
+  if (exception != null) {
 
 Review comment:
   Let me try to summarize the main flow to see if I understand it correctly:
   
   **Previous Flow**
   * get job state:
 * get non-`UNKNOWN` state -> **reset backoff** -> continue loop if not 
terminal;
* will timeout at max duration or get a terminal state; correct 
behavior;
 * get `IOException`, same as:
 * get `UNKNOWN` state -> continue loop unconditionally;
* does not **not** reset backoff;
* can exceed number of allowed attempts fast, not waiting 

[jira] [Updated] (BEAM-123) Skip header row from csv

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-123:
--
Status: Open  (was: Triage Needed)

> Skip header row from csv 
> -
>
> Key: BEAM-123
> URL: https://issues.apache.org/jira/browse/BEAM-123
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Davin Pidoto
>Priority: Minor
>  Labels: newbie, starter
>
> Add functionality to skip header rows when reading from a csv file.
> http://stackoverflow.com/questions/28450554/skipping-header-rows-is-it-possible-with-cloud-dataflow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-233) Make Registering Avro Specific Records Easier

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-233.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Make Registering Avro Specific Records Easier
> -
>
> Key: BEAM-233
> URL: https://issues.apache.org/jira/browse/BEAM-233
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Jesse Anderson
>Priority: Major
>  Labels: starter
> Fix For: Not applicable
>
>
> There should be a helper method to make it easier to register Avro specific 
> record classes. This will be the most common type that needs to be 
> registered. The code would look something like:
> {code:java}
> public class AvroHelper {
> public static void registerAvro(Pipeline p, Class SpecificRecordBase> clazz) {
> p.getCoderRegistry().registerCoder(clazz, new CoderFactory() {
> @Override
> public Coder create(List> componentCoders) {
> return AvroCoder.of(clazz);
> }
> @Override
> public List getInstanceComponents(Object value) {
> return null;
> }
> });
> }
> }
> {code}
> With usage:
> {code:java}
> Pipeline p = Pipeline.create(options);
> 
> AvroHelper.registerAvro(p, LogEntry.class);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-221) ProtoIO

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-221:
--
Status: Open  (was: Triage Needed)

> ProtoIO
> ---
>
> Key: BEAM-221
> URL: https://issues.apache.org/jira/browse/BEAM-221
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>  Labels: newbie, starter
>
> Make it easy to read and write binary files of Protobuf objects. If there is 
> a standard open source format for this, use it.
> If not, roll our own and implement it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-352) Add DisplayData to HDFS Sources

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía closed BEAM-352.
-

> Add DisplayData to HDFS Sources
> ---
>
> Key: BEAM-352
> URL: https://issues.apache.org/jira/browse/BEAM-352
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Ben Chambers
>Priority: Minor
>  Labels: starter
> Fix For: Not applicable
>
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-352) Add DisplayData to HDFS Sources

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-352.
---
   Resolution: Fixed
 Assignee: (was: Madhusudhan Reddy Vennapusa)
Fix Version/s: Not applicable

> Add DisplayData to HDFS Sources
> ---
>
> Key: BEAM-352
> URL: https://issues.apache.org/jira/browse/BEAM-352
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Ben Chambers
>Priority: Minor
>  Labels: starter
> Fix For: Not applicable
>
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-352) Add DisplayData to HDFS Sources

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-352:
--
Status: Open  (was: Triage Needed)

> Add DisplayData to HDFS Sources
> ---
>
> Key: BEAM-352
> URL: https://issues.apache.org/jira/browse/BEAM-352
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop
>Reporter: Ben Chambers
>Assignee: Madhusudhan Reddy Vennapusa
>Priority: Minor
>  Labels: starter
>
> Any interesting parameters of the sources/sinks should be exposed as display 
> data. See any of the sources/sinks that already export this (BigQuery, 
> PubSub, etc.) for examples. Also look at the DisplayData builder and 
> HasDisplayData interface for how to wire these up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-6522:
--
Summary: Dill fails to pickle  avro.RecordSchema classes on Python 3.  
(was: Avro RecordSchema class is not picklable)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Frederik Bode
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/internal/pickler.py", 
> line 247, in loads
> return dill.loads(s)
> File 
> "/home/robbe/workspace/beam/sdks/python/.eggs/dill-0.2.9-py3.5.egg/dill/_dill.py",
>  line 317, in loads
> return load(file, ignore)
> File 
> "/home/robbe/workspace/beam/sdks/python/.eggs/dill-0.2.9-py3.5.egg/dill/_dill.py",
>  line 305, in load
> obj = pik.load()
> File 
> "/home/robbe/workspace/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/avro/schema.py",
>  line 173, in __setitem__
> % (key, value, self))
> Exception: Attempting to map key 'favorite_color' to value  object at 0x7f8f72d0d0b8> in ImmutableDict {}
> {code}
>  
> *apache_beam.io.avroio_test.TestAvro.test_split_points*
> *apache_beam.io.avroio_test.TestFastAvro.test_split_points*
> fail with:
>  
> 

[jira] [Commented] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery

2019-05-21 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845328#comment-16845328
 ] 

Udi Meiri commented on BEAM-5537:
-

No, the latest version probably depends on a newer google-cloud-core: 
https://issues.apache.org/jira/browse/BEAM-5538
It's a more involved upgrade since all google-cloud-* dependencies might need 
to be updated at once.

> Beam Dependency Update Request: google-cloud-bigquery
> -
>
> Key: BEAM-5537
> URL: https://issues.apache.org/jira/browse/BEAM-5537
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  - 2018-10-01 19:15:02.343276 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.5.1 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:08:29.646271 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.6.0 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-15 12:09:25.995486 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.6.0 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-22 12:09:52.889923 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.6.0 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:07:44.834195 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 1.6.1. The latest version is 1.11.2 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-490) Swap to using CoGBK as grouping primitive instead of GBK

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-490:
--
Status: Open  (was: Triage Needed)

> Swap to using CoGBK as grouping primitive instead of GBK
> 
>
> Key: BEAM-490
> URL: https://issues.apache.org/jira/browse/BEAM-490
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Priority: Major
>  Labels: backwards-incompatible, portability
>
> The intent is for the semantics of both GBK and CoGBK to be
> unchanged, just swapping their status as primitives.
> CoGBK is a more powerful operator then GBK allowing for two key benefits:
> 1) SDKs are simplified: transforming a CoGBK into a GBK is trivial while the 
> reverse is not.
> 2) It will be easier for runners to provide more efficient implementations of 
> CoGBK as they will be responsible for the logic which takes their own 
> internal grouping implementation and maps it onto a CoGBK.
> This requires the following modifications to the Beam code base:
> 1) Make GBK a composite transform in terms of CoGBK.
> 2) Move the CoGBK from contrib to runners-core as an adapter*. Runners that 
> more naturally support GBK can just use this and everything executes exactly 
> as before.
> *just like GroupByKeyViaGroupByKeyOnly and UnboundedReadFromBoundedSource



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-466) QuantileStateCoder should be a StandardCoder

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-466:
--
Status: Open  (was: Triage Needed)

> QuantileStateCoder should be a StandardCoder
> 
>
> Key: BEAM-466
> URL: https://issues.apache.org/jira/browse/BEAM-466
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Priority: Minor
>  Labels: backward-incompatible
>
> The issue is that the coder does not report component encodings which 
> prevents effective runner inspection of the components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-490) Swap to using CoGBK as grouping primitive instead of GBK

2019-05-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845327#comment-16845327
 ] 

Ismaël Mejía commented on BEAM-490:
---

I am curious why this never happened?
Should we close it?

> Swap to using CoGBK as grouping primitive instead of GBK
> 
>
> Key: BEAM-490
> URL: https://issues.apache.org/jira/browse/BEAM-490
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Luke Cwik
>Priority: Major
>  Labels: backwards-incompatible, portability
>
> The intent is for the semantics of both GBK and CoGBK to be
> unchanged, just swapping their status as primitives.
> CoGBK is a more powerful operator then GBK allowing for two key benefits:
> 1) SDKs are simplified: transforming a CoGBK into a GBK is trivial while the 
> reverse is not.
> 2) It will be easier for runners to provide more efficient implementations of 
> CoGBK as they will be responsible for the logic which takes their own 
> internal grouping implementation and maps it onto a CoGBK.
> This requires the following modifications to the Beam code base:
> 1) Make GBK a composite transform in terms of CoGBK.
> 2) Move the CoGBK from contrib to runners-core as an adapter*. Runners that 
> more naturally support GBK can just use this and everything executes exactly 
> as before.
> *just like GroupByKeyViaGroupByKeyOnly and UnboundedReadFromBoundedSource



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-741) Values transform does not use the correct output coder when values is an Iterable

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-741:
--
Status: Open  (was: Triage Needed)

> Values transform does not use the correct output coder when values is an 
> Iterable
> 
>
> Key: BEAM-741
> URL: https://issues.apache.org/jira/browse/BEAM-741
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Andrew Martin
>Priority: Major
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5315) Finish Python 3 porting for io module

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-5315.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5315) Finish Python 3 porting for io module

2019-05-21 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845324#comment-16845324
 ] 

Valentyn Tymofieiev commented on BEAM-5315:
---

Closing this issue, let's track any outstanding open items in separate issues 
associated with particular IOs.

> Finish Python 3 porting for io module
> -
>
> Key: BEAM-5315
> URL: https://issues.apache.org/jira/browse/BEAM-5315
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Juta Staes
>Priority: Major
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-793) JdbcIO can create a deadlock when parallelism is greater than 1

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-793:
--
Status: Open  (was: Triage Needed)

> JdbcIO can create a deadlock when parallelism is greater than 1
> ---
>
> Key: BEAM-793
> URL: https://issues.apache.org/jira/browse/BEAM-793
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jdbc
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> With the following JdbcIO configuration, if the parallelism is greater than 
> 1, we can have a {{Deadlock found when trying to get lock; try restarting 
> transaction}}.
> {code}
> MysqlDataSource dbCfg = new MysqlDataSource();
> dbCfg.setDatabaseName("db");
> dbCfg.setUser("user");
> dbCfg.setPassword("pass");
> dbCfg.setServerName("localhost");
> dbCfg.setPortNumber(3306);
> p.apply(Create.of(data))
> .apply(JdbcIO. Long>>write()
> 
> .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(dbCfg))
> .withStatement("INSERT INTO 
> smth(loc,event_type,hash,begin_date,end_date) VALUES(?, ?, ?, ?, ?) ON 
> DUPLICATE KEY UPDATE event_type=VALUES(event_type),end_date=VALUES(end_date)")
> .withPreparedStatementSetter(new 
> JdbcIO.PreparedStatementSetter Long>>() {
> public void setParameters(Tuple5 Integer, ByteString, Long, Long> element, PreparedStatement statement)
> throws Exception {
> statement.setInt(1, element.f0);
> statement.setInt(2, element.f1);
> statement.setBytes(3, 
> element.f2.toByteArray());
> statement.setLong(4, element.f3);
> statement.setLong(5, element.f4);
> }
> }));
> {code}
> This can happen due to the {{autocommit}}. I'm going to investigate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1189) Add guide for release verifiers in the release guide

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1189:
---
Status: Open  (was: Triage Needed)

> Add guide for release verifiers in the release guide
> 
>
> Key: BEAM-1189
> URL: https://issues.apache.org/jira/browse/BEAM-1189
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kenneth Knowles
>Assignee: Griselda Cuevas Zambrano
>Priority: Major
>
> This came up during the 0.4.0-incubating release discussion.
> There is this checklist: 
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> And we could point to that but make more detailed Beam-specific instructions 
> on 
> http://beam.apache.org/contribute/release-guide/#vote-on-the-release-candidate
> And the template for the vote email should include a link to suggested 
> verification steps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1129) Umbrella JIRA to fix/enable Findbugs/Spotbugs

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1129:
---
Status: Open  (was: Triage Needed)

> Umbrella JIRA to fix/enable Findbugs/Spotbugs
> -
>
> Key: BEAM-1129
> URL: https://issues.apache.org/jira/browse/BEAM-1129
> Project: Beam
>  Issue Type: Bug
>  Components: project-management
>Reporter: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: findbugs, spotbugs
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-1319) PipelineOptions subclasses defined in the main session could be duplicated

2019-05-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845316#comment-16845316
 ] 

Ismaël Mejía commented on BEAM-1319:


Is this still an issue or can be closed?

> PipelineOptions subclasses defined in the main session could be duplicated
> --
>
> Key: BEAM-1319
> URL: https://issues.apache.org/jira/browse/BEAM-1319
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: Not applicable
>
>
> Duplication is caused as a result of the save_main_session option.
> This also breaks argparse because same options will be defined multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1296) Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1296:
---
Status: Open  (was: Triage Needed)

> Providing a small dataset for "Apache Beam Mobile Gaming Pipeline Examples"
> ---
>
> Key: BEAM-1296
> URL: https://issues.apache.org/jira/browse/BEAM-1296
> Project: Beam
>  Issue Type: Wish
>  Components: examples-java
>Reporter: Keiji Yoshida
>Priority: Trivial
>  Labels: newbie, starter
>
> A dataset "gs://apache-beam-samples/game/gaming_data*.csv" for "Apache Beam 
> Mobile Gaming Pipeline Examples" is so huge (about 12 GB) and it takes long 
> time to download the dataset. It might pose difficulties to Apache Beam 
> beginners who want to try "Apache Beam Mobile Gaming Pipeline Examples" 
> quickly.
> How about providing a small dataset (say less than 1 GB) for this examples?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1335) ValueState could use an initial/default value

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1335:
---
Status: Open  (was: Triage Needed)

> ValueState could use an initial/default value
> -
>
> Key: BEAM-1335
> URL: https://issues.apache.org/jira/browse/BEAM-1335
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Mikhail Gryzykhin
>Priority: Minor
>  Labels: starter
>
> In writing example state code with {{ValueState}} there is almost always a 
> use of {{firstNonNull(state.read(), defaultValue)}}. It would be nice to bake 
> this into the declaration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1458) Checkpoint support in Beam

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1458:
---
Status: Open  (was: Triage Needed)

> Checkpoint support in Beam
> --
>
> Key: BEAM-1458
> URL: https://issues.apache.org/jira/browse/BEAM-1458
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Affects Versions: 0.5.0
>Reporter: Rafal Wojdyla
>Priority: Major
>  Labels: features
>
> Beam could support checkpoints - similar to:
>  * flink's snapshots
>  * scalding's checkpoints
> Checkpoint should provides a simple mechanism to read and write intermediate 
> results. It would be useful for debugging one part of a long flow, when you 
> would otherwise have to run many steps to get to the one you care about.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed

2019-05-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6284?focusedWorklogId=246432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246432
 ]

ASF GitHub Bot logged work on BEAM-6284:


Author: ASF GitHub Bot
Created on: 21/May/19 22:44
Start Date: 21/May/19 22:44
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #8629: [BEAM-6284] Improve 
error message on waitUntilFinish.
URL: https://github.com/apache/beam/pull/8629#issuecomment-494586284
 
 
   UPD:
   Confirmed that State.UNKNOWN is not supposed to be terminal on API side.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246432)
Time Spent: 10m
Remaining Estimate: 0h

> [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with 
> result UNKNOWN on succeeded job and checks passed
> --
>
> Key: BEAM-6284
> URL: https://issues.apache.org/jira/browse/BEAM-6284
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testWindowedSideInputFixedToGlobal/
> Initial investigation:
> According to logs all test-relevant checks have passed and it seem to be 
> testing framework failure.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1462) DirectRunner unnecessarily re-scheules tasks after exceptions

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1462:
---
Status: Open  (was: Triage Needed)

> DirectRunner unnecessarily re-scheules tasks after exceptions
> -
>
> Key: BEAM-1462
> URL: https://issues.apache.org/jira/browse/BEAM-1462
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Jayalath
>Priority: Major
>  Labels: newbie, starter
>
> Seems like DirectRunner keeps scheduling tasks when exceptions occur when 
> reading BigQuery results (and possibly in other cases).
> I verified that rescheduling is not coming from BigQuery. AFAIKT a 
> _MonitorTask that gets added at following location does not get removed 
> properly when an exception is thrown.
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/direct/executor.py#L361
> To reproduce:
> (1) Raise a 'ValueError' at the beginning of method 
> BigQueryWrapper.convert_row_to_dict at following location.
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py#L1061
> (2) Setup Python SDK and run bigquery_tornadoes with DirectRunner.
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output  
> --project 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1582) ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1582:
---
Status: Open  (was: Triage Needed)

> ResumeFromCheckpointStreamingTest flakes with what appears as a second firing.
> --
>
> Key: BEAM-1582
> URL: https://issues.apache.org/jira/browse/BEAM-1582
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Priority: Minor
>  Labels: flake
>
> See: 
> https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-spark/2788/testReport/junit/org.apache.beam.runners.spark.translation.streaming/ResumeFromCheckpointStreamingTest/testWithResume/
> After some digging in it appears that a second firing occurs (though only one 
> is expected) but it doesn't come from a stale state (state is empty before it 
> fires).
> Might be a retry happening for some reason, which is OK in terms of 
> fault-tolerance guarantees (at-least-once), but not so much in terms of flaky 
> tests. 
> I'm looking into this hoping to fix this ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-1811) Extract common class for WithTimestamps.AddTimestampsDoFn and Create.TimestampedValues.ConvertTimestamps

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-1811:
---
Status: Open  (was: Triage Needed)

> Extract common class for WithTimestamps.AddTimestampsDoFn and 
> Create.TimestampedValues.ConvertTimestamps
> 
>
> Key: BEAM-1811
> URL: https://issues.apache.org/jira/browse/BEAM-1811
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Wesley Tanaka
>Priority: Minor
>  Labels: newbie, starter
>
> It seems like these APIs are predominantly duplicative of each other and, 
> that it's hard to find one of them if you knew about the other.
> https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/WithTimestamps.java#L134
> https://github.com/apache/beam/blob/348d335883b14a9b143b65e4b3c62dc79f62d77e/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Create.java#L560
> What would make the most sense to me is if TimestampedValues were implemented 
> in terms of both Values and WithTimestamps.  I'm still learning about Beam 
> though -- would this approach cause some kind of performance problem?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2327) Name yyy.version properties using the artifactId instead of an arbitrary name in all pom.xml

2019-05-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845314#comment-16845314
 ] 

Ismaël Mejía commented on BEAM-2327:


can this one be closed? or does it still make sense in the gradle days?

> Name yyy.version properties using the artifactId instead of an arbitrary name 
> in all pom.xml
> 
>
> Key: BEAM-2327
> URL: https://issues.apache.org/jira/browse/BEAM-2327
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Luke Cwik
>Priority: Trivial
>  Labels: starter
>
> Currently we give arbitrary names to properties which store versions of 
> dependencies instead of standardizing on naming like:
> artifactId.version
> For many of our artifacts this makes sense. There are a few cases where the 
> artifactId should not be used because we are intending to provide a version 
> lock over a set of related packages such as:
> * google client libraries
> * slf4j



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2404) BigQueryIO reading stalls if no data is returned by query

2019-05-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845312#comment-16845312
 ] 

Ismaël Mejía commented on BEAM-2404:


Is this already fixed?

> BigQueryIO reading stalls if no data is returned by query
> -
>
> Key: BEAM-2404
> URL: https://issues.apache.org/jira/browse/BEAM-2404
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.0.0
>Reporter: Andre
>Assignee: Chamikara Jayalath
>Priority: Major
> Fix For: Not applicable
>
>
> When running a BigQueryIO query that doesn't return any rows (e.g. nothing 
> has changed in a delta job) the job seems to stall and nothing happens as no 
> temp files are being written which I think might be what it is waiting for. 
> Just adding one row to the source table will make the job run through 
> successfully.
> Code:
> {code:java}
> PCollection  rows = p.apply("ReadFromBQ",
>  BigQueryIO.read()
>  .fromQuery("SELECT * FROM `myproject.dataset.table`")
>  .withoutResultFlattening().usingStandardSql());
> {code}
>   
> Log:
> {code:java}   
> Jun 02, 2017 9:00:36 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-query, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-query
> Jun 02, 2017 9:03:11 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: Starting BigQuery extract job: beam_job_batch-extract
> Jun 02, 2017 9:03:12 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-extract, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-extract
> Jun 02, 2017 9:04:06 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: BigQuery extract job completed: beam_job_batch-extract
> Jun 02, 2017 9:04:08 AM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern 
> gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/.avro
> Jun 02, 2017 9:04:09 AM org.apache.beam.sdk.io.FileBasedSource 
> getEstimatedSizeBytes
> INFO: Filepattern 
> gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/.avro
>  matched 1 files with total size 9750
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2404) BigQueryIO reading stalls if no data is returned by query

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2404:
---
Status: Open  (was: Triage Needed)

> BigQueryIO reading stalls if no data is returned by query
> -
>
> Key: BEAM-2404
> URL: https://issues.apache.org/jira/browse/BEAM-2404
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.0.0
>Reporter: Andre
>Assignee: Chamikara Jayalath
>Priority: Major
> Fix For: Not applicable
>
>
> When running a BigQueryIO query that doesn't return any rows (e.g. nothing 
> has changed in a delta job) the job seems to stall and nothing happens as no 
> temp files are being written which I think might be what it is waiting for. 
> Just adding one row to the source table will make the job run through 
> successfully.
> Code:
> {code:java}
> PCollection  rows = p.apply("ReadFromBQ",
>  BigQueryIO.read()
>  .fromQuery("SELECT * FROM `myproject.dataset.table`")
>  .withoutResultFlattening().usingStandardSql());
> {code}
>   
> Log:
> {code:java}   
> Jun 02, 2017 9:00:36 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-query, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-query
> Jun 02, 2017 9:03:11 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: Starting BigQuery extract job: beam_job_batch-extract
> Jun 02, 2017 9:03:12 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl 
> startJob
> INFO: Started BigQuery job: {jobId=beam_job_batch-extract, 
> projectId=my-project}.
> bq show -j --format=prettyjson --project_id=my-project beam_job_batch-extract
> Jun 02, 2017 9:04:06 AM 
> org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase executeExtract
> INFO: BigQuery extract job completed: beam_job_batch-extract
> Jun 02, 2017 9:04:08 AM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern 
> gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/.avro
> Jun 02, 2017 9:04:09 AM org.apache.beam.sdk.io.FileBasedSource 
> getEstimatedSizeBytes
> INFO: Filepattern 
> gs://my-bucket/tmp/BigQueryExtractTemp/ff594d003c6440a1ad84b9e02858b5c6/.avro
>  matched 1 files with total size 9750
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2327) Name yyy.version properties using the artifactId instead of an arbitrary name in all pom.xml

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2327:
---
Status: Open  (was: Triage Needed)

> Name yyy.version properties using the artifactId instead of an arbitrary name 
> in all pom.xml
> 
>
> Key: BEAM-2327
> URL: https://issues.apache.org/jira/browse/BEAM-2327
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Luke Cwik
>Priority: Trivial
>  Labels: starter
>
> Currently we give arbitrary names to properties which store versions of 
> dependencies instead of standardizing on naming like:
> artifactId.version
> For many of our artifacts this makes sense. There are a few cases where the 
> artifactId should not be used because we are intending to provide a version 
> lock over a set of related packages such as:
> * google client libraries
> * slf4j



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2530) Make Beam compatible with next Java LTS version (Java 11)

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2530:
---
Status: Open  (was: Triage Needed)

> Make Beam compatible with next Java LTS version (Java 11)
> -
>
> Key: BEAM-2530
> URL: https://issues.apache.org/jira/browse/BEAM-2530
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: java9
> Fix For: Not applicable
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The goal of this task is to validate that the Java SDK and the Java Direct 
> Runner (and its tests) work as intended on the next Java LTS version (Java 11 
> /18.9). For this we will base the compilation on the java.base profile and 
> include other core Java modules when needed.  
> *Notes:*
> - Ideally validation of the IOs/extensions will be included but if serious 
> issues are found they will be tracked independently.
> - The goal of using the Java Platform module system is out of the scope of 
> this work.
> - Support for other runners will be a tracked as a separate effort because 
> other runners depend strongly in the support of the native runner ecosystems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2758) ParDo should indicate what "features" are used in DisplayData

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2758:
---
Status: Open  (was: Triage Needed)

> ParDo should indicate what "features" are used in DisplayData
> -
>
> Key: BEAM-2758
> URL: https://issues.apache.org/jira/browse/BEAM-2758
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Ben Chambers
>Priority: Major
>  Labels: newbie
>
> ParDo now exposes numerous features, such as SplittableDoFn, State, Timers, 
> etc. It would be good if the specific features being used where readily 
> visible within the Display Data of the given Pardo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2728) Extension for sketch-based statistics

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2728:
---
Status: Open  (was: Triage Needed)

> Extension for sketch-based statistics
> -
>
> Key: BEAM-2728
> URL: https://issues.apache.org/jira/browse/BEAM-2728
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-sketching
>Reporter: Arnaud Fournier
>Assignee: Arnaud Fournier
>Priority: Minor
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Goal : Provide an extension library to compute approximate statistics on 
> streams.
> Interest : Probabilistic data structures can create an approximation (sketch) 
> of the current state of a stream without storing every element but rather 
> processing each observation quickly to summarize its current state and find 
> useful statistical insights.
> Implementation is here : 
> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/extensions/sketching
> More info : 
> https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUeusiwL0Jo2ACI5PEOP1kc/edit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2944) Update Beam capability matrix using Nexmark

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2944:
---
Status: Open  (was: Triage Needed)

> Update Beam capability matrix using Nexmark
> ---
>
> Key: BEAM-2944
> URL: https://issues.apache.org/jira/browse/BEAM-2944
> Project: Beam
>  Issue Type: Task
>  Components: examples-nexmark
>Reporter: Etienne Chauchot
>Priority: Major
>  Labels: nexmark
>
> Run Nexmark query set on all the runners in both batch and streaming modes to 
> update the capability matrix and provide metrics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2856) Update Nexmark Query 10 to use AvroIO

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2856:
---
Status: Open  (was: Triage Needed)

> Update Nexmark Query 10 to use AvroIO
> -
>
> Key: BEAM-2856
> URL: https://issues.apache.org/jira/browse/BEAM-2856
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: nexmark
>
> Nexmark's Query 10 tested writing to sharded files on Google Storage, it used 
> some google specific APIs and it 'manually' ensured sharding. I suppose we 
> can update this to support the other filesystems and withSharding, or we 
> should if not redefine the use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2946) Nexmark: enhance unit tests of queries

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2946:
---
Status: Open  (was: Triage Needed)

> Nexmark: enhance unit tests of queries
> --
>
> Key: BEAM-2946
> URL: https://issues.apache.org/jira/browse/BEAM-2946
> Project: Beam
>  Issue Type: Test
>  Components: examples-nexmark
>Reporter: Etienne Chauchot
>Priority: Major
>  Labels: nexmark
>
> Queries 10, 11 and 12 have no unit tests for now. Need to add them



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-2945) Nexmark: Fix query9 window merging

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-2945:
---
Status: Open  (was: Triage Needed)

> Nexmark: Fix query9 window merging
> --
>
> Key: BEAM-2945
> URL: https://issues.apache.org/jira/browse/BEAM-2945
> Project: Beam
>  Issue Type: Bug
>  Components: examples-nexmark
>Reporter: Etienne Chauchot
>Priority: Major
>  Labels: nexmark
>
> Nexmark Query9 uses custom windows and merges them. It runs fine in spark 
> runner even if spark runner does not support custom window merging yet 
> (https://issues.apache.org/jira/browse/BEAM-2499). So fix the merge process 
> in Query9 to be closer to 
> https://github.com/apache/beam/blob/001285a88c9e12473ea31241146208a4b61fb0ef/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java#L591



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3038) Add support for Azure Data Lake Storage as a Apache Beam FileSystem

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3038:
---
Status: Open  (was: Triage Needed)

> Add support for Azure Data Lake Storage as a Apache Beam FileSystem
> ---
>
> Key: BEAM-3038
> URL: https://issues.apache.org/jira/browse/BEAM-3038
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas
>Reporter: Romit Girdhar
>Priority: Minor
>  Labels: features
>
> This is for providing direct integration with Azure Data Lake Store as an 
> Apache Beam File system.
> There is already support for Azure Data Lake for using it as HDFS: 
> https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3489) Expose the message id of received messages within PubsubMessage

2019-05-21 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845308#comment-16845308
 ] 

Luke Cwik commented on BEAM-3489:
-

No, the PR is still open, code looks pretty good but waiting on tests to be 
written.

> Expose the message id of received messages within PubsubMessage
> ---
>
> Key: BEAM-3489
> URL: https://issues.apache.org/jira/browse/BEAM-3489
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Thinh Ha
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This task is about passing forward the message id from the pubsub proto to 
> the java PubsubMessage.
> Add a message id field to PubsubMessage.
> Update the coder for PubsubMessage to encode the message id.
> Update the translation from the Pubsub proto message to the Dataflow message:
> https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3040) Python precommit timed out after 150 minutes

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3040:
---
Status: Open  (was: Triage Needed)

> Python precommit timed out after 150 minutes
> 
>
> Key: BEAM-3040
> URL: https://issues.apache.org/jira/browse/BEAM-3040
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kenneth Knowles
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PreCommit_Python_MavenInstall/143/consoleFull
> Within about 10 minutes it reaches this point:
> {code}
> ...
> 2017-10-10T03:33:33.591 [INFO] --- findbugs-maven-plugin:3.0.4:check 
> (default) @ beam-sdks-python ---
> 2017-10-10T03:33:33.702 [INFO] 
> 2017-10-10T03:33:33.702 [INFO] --- exec-maven-plugin:1.5.0:exec 
> (setuptools-test) @ beam-sdks-python ---
> {code}
> and the final output is like this:
> {code}
> ...
> 2017-10-10T03:33:33.702 [INFO] --- exec-maven-plugin:1.5.0:exec 
> (setuptools-test) @ beam-sdks-python ---
> docs create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/docs
> GLOB sdist-make: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/setup.py
> lint create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/lint
> py27 create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27
> py27cython create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27cython
> py27cython installdeps: nose==1.3.7, grpcio-tools==1.3.5, cython==0.25.2
> docs installdeps: nose==1.3.7, grpcio-tools==1.3.5, Sphinx==1.5.5, 
> sphinx_rtd_theme==0.2.4
> lint installdeps: nose==1.3.7, pycodestyle==2.3.1, pylint==1.7.1
> py27 installdeps: nose==1.3.7, grpcio-tools==1.3.5
> lint inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27cython inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 runtests: PYTHONHASHSEED='2225684666'
> py27 runtests: commands[0] | python --version
> py27 runtests: commands[1] | - find apache_beam -type f -name *.pyc -delete
> py27 runtests: commands[2] | pip install -e .[test]
> lint runtests: PYTHONHASHSEED='2225684666'
> lint runtests: commands[0] | time pip install -e .[test]
> docs inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27 runtests: commands[3] | python 
> apache_beam/examples/complete/autocomplete_test.py
> lint runtests: commands[1] | time 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/run_pylint.sh
> py27 runtests: commands[4] | python setup.py test
> docs runtests: PYTHONHASHSEED='2225684666'
> docs runtests: commands[0] | time pip install -e .[test,gcp,docs]
> docs runtests: commands[1] | time 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/generate_pydoc.sh
> py27gcp create: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/py27gcp
> py27gcp installdeps: nose==1.3.7
> py27cython runtests: PYTHONHASHSEED='2225684666'
> py27cython runtests: commands[0] | python --version
> py27cython runtests: commands[1] | - find apache_beam -type f -name *.pyc 
> -delete
> py27cython runtests: commands[2] | - find apache_beam -type f -name *.c 
> -delete
> py27cython runtests: commands[3] | - find apache_beam -type f -name *.so 
> -delete
> py27cython runtests: commands[4] | - find target/build -type f -name *.c 
> -delete
> py27cython runtests: commands[5] | - find target/build -type f -name *.so 
> -delete
> py27cython runtests: commands[6] | time pip install -e .[test]
> py27gcp inst: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_MavenInstall/sdks/python/target/.tox/dist/apache-beam-2.3.0.dev.zip
> py27gcp runtests: PYTHONHASHSEED='2225684666'
> py27gcp runtests: commands[0] | pip install -e .[test,gcp]
> py27gcp runtests: commands[1] | python --version
> py27gcp runtests: commands[2] | - find apache_beam -type f -name *.pyc -delete
> py27gcp runtests: commands[3] | python 
> apache_beam/examples/complete/autocomplete_test.py
> py27gcp runtests: commands[4] | python setup.py test
> py27cython runtests: commands[7] | python 
> apache_beam/examples/complete/autocomplete_test.py
> py27cython 

[jira] [Updated] (BEAM-3096) generic api support for Graph Computation like GraphX on Spark

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3096:
---
Status: Open  (was: Triage Needed)

> generic api support for Graph Computation like GraphX on Spark
> --
>
> Key: BEAM-3096
> URL: https://issues.apache.org/jira/browse/BEAM-3096
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-ideas
>Reporter: rayeaster
>Priority: Major
>  Labels: features
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Is there any plan to add support for graph computation like GraphX on Spark?
> * graph representation in PCollection 
> * basic statistics like vertex/edge count
> * base function like vertex/edge-wise mapreduce task(i.e., count the outgoing 
> degree of a vertex)
> * base function like subgraph combine/join
> * ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3055) Retry downloading required test artifacts with a backoff when download fails.

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3055:
---
Status: Open  (was: Triage Needed)

> Retry downloading required test artifacts with a backoff when download fails.
> -
>
> Key: BEAM-3055
> URL: https://issues.apache.org/jira/browse/BEAM-3055
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Jason Kuster
>Priority: Major
> Fix For: Not applicable
>
>
> When Maven fails to download a required artifact for a test, the test fails. 
> Is it possible to configure Maven to retry the download with a backoff up to 
> N number of attempts?
> Example test failure:
> https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/15004/console
> 2017-10-11T19:01:21.382 [INFO] 
> 
> 2017-10-11T19:01:21.382 [INFO] BUILD FAILURE
> 2017-10-11T19:01:21.382 [INFO] 
> 
> 2017-10-11T19:01:21.383 [INFO] Total time: 55:20 min
> 2017-10-11T19:01:21.383 [INFO] Finished at: 2017-10-11T19:01:21+00:00
> 2017-10-11T19:01:23.807 [INFO] Final Memory: 261M/2068M
> 2017-10-11T19:01:23.807 [INFO] 
> 
> 2017-10-11T19:01:23.836 [ERROR] Failed to execute goal on project 
> beam-sdks-java-io-hcatalog: Could not resolve dependencies for project 
> org.apache.beam:beam-sdks-java-io-hcatalog:jar:2.2.0-SNAPSHOT: The following 
> artifacts could not be resolved: org.apache.hive:hive-metastore:jar:2.1.0, 
> javolution:javolution:jar:5.5.1: Could not transfer artifact 
> org.apache.hive:hive-metastore:jar:2.1.0 from/to central 
> (https://repo.maven.apache.org/maven2): GET request of: 
> org/apache/hive/hive-metastore/2.1.0/hive-metastore-2.1.0.jar from central 
> failed: Connection reset -> [Help 1].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3310) Push metrics to a backend in an runner agnostic way

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3310:
---
Status: Open  (was: Triage Needed)

> Push metrics to a backend in an runner agnostic way
> ---
>
> Key: BEAM-3310
> URL: https://issues.apache.org/jira/browse/BEAM-3310
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-extensions-metrics, sdk-java-core
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics 
> (either at the end of the pipeline or while it runs) because they don't have 
> all the same capabilities towards metrics (e.g. spark runner configures sinks 
>  like csv, graphite or in memory sinks using the spark engine conf). The 
> target is to push the metrics in the common runner code so that no matter the 
> chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: 
> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed

2019-05-21 Thread Mikhail Gryzykhin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845306#comment-16845306
 ] 

Mikhail Gryzykhin commented on BEAM-6284:
-

Seems that the problem is within waitUntilFinish. It treats UNKNOWN state as 
error and exits after MAX_RETRIES.

I got PR out to treat UNKNOWN as non-terminal state, but still confirming 
whether UNKNOWN 'CAN' be terminal. If it can, then I'll look for alternative 
way to handle the issue.

> [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with 
> result UNKNOWN on succeeded job and checks passed
> --
>
> Key: BEAM-6284
> URL: https://issues.apache.org/jira/browse/BEAM-6284
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testWindowedSideInputFixedToGlobal/
> Initial investigation:
> According to logs all test-relevant checks have passed and it seem to be 
> testing framework failure.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3489) Expose the message id of received messages within PubsubMessage

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3489:
---
Status: Open  (was: Triage Needed)

> Expose the message id of received messages within PubsubMessage
> ---
>
> Key: BEAM-3489
> URL: https://issues.apache.org/jira/browse/BEAM-3489
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Thinh Ha
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This task is about passing forward the message id from the pubsub proto to 
> the java PubsubMessage.
> Add a message id field to PubsubMessage.
> Update the coder for PubsubMessage to encode the message id.
> Update the translation from the Pubsub proto message to the Dataflow message:
> https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3489) Expose the message id of received messages within PubsubMessage

2019-05-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845304#comment-16845304
 ] 

Ismaël Mejía commented on BEAM-3489:


Is this already fixed?

> Expose the message id of received messages within PubsubMessage
> ---
>
> Key: BEAM-3489
> URL: https://issues.apache.org/jira/browse/BEAM-3489
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Luke Cwik
>Assignee: Thinh Ha
>Priority: Minor
>  Labels: newbie, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This task is about passing forward the message id from the pubsub proto to 
> the java PubsubMessage.
> Add a message id field to PubsubMessage.
> Update the coder for PubsubMessage to encode the message id.
> Update the translation from the Pubsub proto message to the Dataflow message:
> https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3493) Prevent users from "implementing" PipelineOptions

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3493:
---
Status: Open  (was: Triage Needed)

> Prevent users from "implementing" PipelineOptions
> -
>
> Key: BEAM-3493
> URL: https://issues.apache.org/jira/browse/BEAM-3493
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: newbie, starter
>
> I've seen a user implement \{{PipelineOptions}}. This implies that it is 
> backwards-incompatible to add new options, which is of course not our intent. 
> We should at least document very loudly that it is not to be implemented, and 
> preferably have some automation that will fail on load if they have 
> implemented it. Ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-7381) Connecting to Google Container Registry from Jenkins workers

2019-05-21 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845303#comment-16845303
 ] 

yifan zou edited comment on BEAM-7381 at 5/21/19 10:31 PM:
---

Great! I'll try to get the worker updated later today (to avoid impact others 
if I screw up :)).

In the meanwhile, you can still use the beam17-jnlp for testing.


was (Author: yifanzou):
Great! I'll try to get the worker updated later today (to avoid impact others 
if I screw up :)).

> Connecting to Google Container Registry from Jenkins workers
> 
>
> Key: BEAM-7381
> URL: https://issues.apache.org/jira/browse/BEAM-7381
> Project: Beam
>  Issue Type: Wish
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: yifan zou
>Priority: Major
>
> I'm working on [running Portable Python Load 
> tests|https://github.com/apache/beam/pull/8636] on our existing Flink 
> Dataproc infrastructure. To run the tests on the freshest version of SDK 
> harnesses and Job servers, I want to be able to push/pull those images 
> to/from Google Container Registry in apache-beam-testing project. However, I 
> can't connect to the registry - I got the following error message while 
> pushing the images:
>   
> {code:java}
> unauthorized: You don't have the needed permissions to perform this 
> operation, and you may have invalid credentials. To authenticate your 
> request, follow the steps in: 
> https://cloud.google.com/container-registry/docs/advanced-authentication 
> {code}
>  
>  (see more here: 
> [https://builds.apache.org/job/beam_LoadTests_Python_GBK_Flink_Batch_PR/5/console]
>  )
>   
>  From what I know, the best way to deal with this is to install the 
> [standalone docker 
> credential|https://cloud.google.com/container-registry/docs/advanced-authentication#standalone_docker_credential_helper]
>  helper on workers. It would then provide possibility to authenticate every 
> time Jenkins jobs need to push/pull images. 
> I seem to not have permissions to install this tool on workers - if my 
> reasoning is correct, can we install this?
> I specifically mean running this:
> {code:java}
> gcloud components install docker-credential-gcr{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3604) MqttIOTest testReadNoClientId failure timeout

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3604:
---
Status: Open  (was: Triage Needed)

> MqttIOTest testReadNoClientId failure timeout
> -
>
> Key: BEAM-3604
> URL: https://issues.apache.org/jira/browse/BEAM-3604
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mqtt
>Reporter: Kenneth Knowles
>Assignee: Ismaël Mejía
>Priority: Critical
>  Labels: flake
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I've seen failures a bit today. Here is one:
> [https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/1758/testReport/junit/org.apache.beam.sdk.io.mqtt/MqttIOTest/testReadNoClientId/]
> Filing all flakes as "Critical" priority so we can sickbay or fix.
> Since that build will get GC'd, here is the Standard Error. It looks like 
> from that perspective everything went as planned, but perhaps the test has a 
> race condition or something?
> {code}
> Feb 01, 2018 11:28:01 PM org.apache.beam.sdk.io.mqtt.MqttIOTest startBroker
> INFO: Finding free network port
> Feb 01, 2018 11:28:01 PM org.apache.beam.sdk.io.mqtt.MqttIOTest startBroker
> INFO: Starting ActiveMQ brokerService on 57986
> Feb 01, 2018 11:28:03 PM org.apache.activemq.broker.BrokerService 
> doStartPersistenceAdapter
> INFO: Using Persistence Adapter: MemoryPersistenceAdapter
> Feb 01, 2018 11:28:04 PM org.apache.activemq.broker.BrokerService 
> doStartBroker
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:1) is 
> starting
> Feb 01, 2018 11:28:04 PM 
> org.apache.activemq.transport.TransportServerThreadSupport doStart
> INFO: Listening for connections at: mqtt://localhost:57986
> Feb 01, 2018 11:28:04 PM org.apache.activemq.broker.TransportConnector start
> INFO: Connector mqtt://localhost:57986 started
> Feb 01, 2018 11:28:04 PM org.apache.activemq.broker.BrokerService 
> doStartBroker
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:1) started
> Feb 01, 2018 11:28:04 PM org.apache.activemq.broker.BrokerService 
> doStartBroker
> INFO: For help or more information please see: http://activemq.apache.org
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService stop
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:1) is 
> shutting down
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.TransportConnector stop
> INFO: Connector mqtt://localhost:57986 stopped
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService stop
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:1) uptime 
> 24.039 seconds
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService stop
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:1) is 
> shutdown
> Feb 01, 2018 11:28:26 PM org.apache.beam.sdk.io.mqtt.MqttIOTest startBroker
> INFO: Finding free network port
> Feb 01, 2018 11:28:26 PM org.apache.beam.sdk.io.mqtt.MqttIOTest startBroker
> INFO: Starting ActiveMQ brokerService on 46799
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService 
> doStartPersistenceAdapter
> INFO: Using Persistence Adapter: MemoryPersistenceAdapter
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService 
> doStartBroker
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:2) is 
> starting
> Feb 01, 2018 11:28:26 PM 
> org.apache.activemq.transport.TransportServerThreadSupport doStart
> INFO: Listening for connections at: mqtt://localhost:46799
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.TransportConnector start
> INFO: Connector mqtt://localhost:46799 started
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService 
> doStartBroker
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:2) started
> Feb 01, 2018 11:28:26 PM org.apache.activemq.broker.BrokerService 
> doStartBroker
> INFO: For help or more information please see: http://activemq.apache.org
> Feb 01, 2018 11:28:28 PM org.apache.beam.sdk.io.mqtt.MqttIOTest 
> lambda$testRead$1
> INFO: Waiting pipeline connected to the MQTT broker before sending messages 
> ...
> Feb 01, 2018 11:28:35 PM org.apache.activemq.broker.BrokerService stop
> INFO: Apache ActiveMQ 5.13.1 (localhost, 
> ID:115.98.154.104.bc.googleusercontent.com-38646-1517527683931-0:2) is 
> shutting down
> Feb 01, 2018 11:28:35 PM org.apache.activemq.broker.TransportConnector stop
> INFO: Connector mqtt://localhost:46799 

[jira] [Commented] (BEAM-7381) Connecting to Google Container Registry from Jenkins workers

2019-05-21 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845303#comment-16845303
 ] 

yifan zou commented on BEAM-7381:
-

Great! I'll try to get the worker updated later today (to avoid impact others 
if I screw up :)).

> Connecting to Google Container Registry from Jenkins workers
> 
>
> Key: BEAM-7381
> URL: https://issues.apache.org/jira/browse/BEAM-7381
> Project: Beam
>  Issue Type: Wish
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: yifan zou
>Priority: Major
>
> I'm working on [running Portable Python Load 
> tests|https://github.com/apache/beam/pull/8636] on our existing Flink 
> Dataproc infrastructure. To run the tests on the freshest version of SDK 
> harnesses and Job servers, I want to be able to push/pull those images 
> to/from Google Container Registry in apache-beam-testing project. However, I 
> can't connect to the registry - I got the following error message while 
> pushing the images:
>   
> {code:java}
> unauthorized: You don't have the needed permissions to perform this 
> operation, and you may have invalid credentials. To authenticate your 
> request, follow the steps in: 
> https://cloud.google.com/container-registry/docs/advanced-authentication 
> {code}
>  
>  (see more here: 
> [https://builds.apache.org/job/beam_LoadTests_Python_GBK_Flink_Batch_PR/5/console]
>  )
>   
>  From what I know, the best way to deal with this is to install the 
> [standalone docker 
> credential|https://cloud.google.com/container-registry/docs/advanced-authentication#standalone_docker_credential_helper]
>  helper on workers. It would then provide possibility to authenticate every 
> time Jenkins jobs need to push/pull images. 
> I seem to not have permissions to install this tool on workers - if my 
> reasoning is correct, can we install this?
> I specifically mean running this:
> {code:java}
> gcloud components install docker-credential-gcr{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3617) Restore proto round trip for Java DirectRunner (was: Performance degradation on the direct runner)

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3617:
---
Status: Open  (was: Triage Needed)

> Restore proto round trip for Java DirectRunner (was: Performance degradation 
> on the direct runner)
> --
>
> Key: BEAM-3617
> URL: https://issues.apache.org/jira/browse/BEAM-3617
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Jean-Baptiste Onofré
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Running Nexmark queries with the direct runner between Beam 2.2.0 and 2.3.0 
> shows a performance degradation:
> {code}
> 
>  Beam 2.2.0   Beam 2.3.0
>   Query  Runtime(sec) Runtime(sec)
> 
>      6.410.6
>   0001   5.110.2
>   0002   3.0 5.8
>   0003   3.8 6.2
>   0004   0.9 1.4
>   0005   5.811.4
>   0006   0.8 1.4
>   0007 193.8  1249.1
>   0008   3.9 6.9
>   0009   0.9 1.3
>   0010   6.4 8.2
>   0011   5.0 9.4
>   0012   4.7 9.1
> {code}
> We can see especially Query 7 that is 10 times longer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3674) Port ElasticSearchIOTest off DoFnTester

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3674:
---
Status: Open  (was: Triage Needed)

> Port ElasticSearchIOTest off DoFnTester
> ---
>
> Key: BEAM-3674
> URL: https://issues.apache.org/jira/browse/BEAM-3674
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-elasticsearch
>Reporter: Kenneth Knowles
>Priority: Minor
>  Labels: beginner, newbie, starter
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3934) BoundedReader should be closed in JavaReadViaImpulse#ReadFromBoundedSourceFn

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3934:
---
Status: Open  (was: Triage Needed)

> BoundedReader should be closed in JavaReadViaImpulse#ReadFromBoundedSourceFn
> 
>
> Key: BEAM-3934
> URL: https://issues.apache.org/jira/browse/BEAM-3934
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ted Yu
>Priority: Minor
>  Labels: usability
>
> {code}
> public void readSoruce(ProcessContext ctxt) throws IOException {
>   BoundedSource.BoundedReader reader =
>   ctxt.element().createReader(ctxt.getPipelineOptions());
>   for (boolean more = reader.start(); more; more = reader.advance()) {
> ctxt.outputWithTimestamp(reader.getCurrent(), 
> reader.getCurrentTimestamp());
>   }
> }
> {code}
> The BoundedSource.BoundedReader instance should be closed before returning 
> from the method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3816) [nexmark] Something is slightly off with Query 6

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-3816:
---
Status: Open  (was: Triage Needed)

> [nexmark] Something is slightly off with Query 6
> 
>
> Key: BEAM-3816
> URL: https://issues.apache.org/jira/browse/BEAM-3816
> Project: Beam
>  Issue Type: Bug
>  Components: examples-nexmark
>Reporter: Andrew Pilloud
>Priority: Major
>  Labels: easyfix, newbie, nexmark, test
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> java.lang.AssertionError: Query6/Query6.Stamp/ParMultiDo(Anonymous).output: 
> wrong pipeline output Expected: <[\{"seller":1048,"price":83609648}, 
> \{"seller":1052,"price":61788353}, \{"seller":1086,"price":33744823}, 
> \{"seller":1078,"price":19876735}, \{"seller":1058,"price":50692833}, 
> \{"seller":1044,"price":6719489}, \{"seller":1096,"price":31287415}, 
> \{"seller":1095,"price":37004879}, \{"seller":1082,"price":22528654}, 
> \{"seller":1006,"price":57288736}, \{"seller":1051,"price":3967261}, 
> \{"seller":1084,"price":6394160}, \{"seller":1020,"price":3871757}, 
> \{"seller":1007,"price":185293}, \{"seller":1031,"price":11840889}, 
> \{"seller":1080,"price":26896442}, \{"seller":1030,"price":294928}, 
> \{"seller":1066,"price":26839191}, \{"seller":1000,"price":28257749}, 
> \{"seller":1055,"price":17087173}, \{"seller":1072,"price":45662210}, 
> \{"seller":1057,"price":4568399}, \{"seller":1025,"price":29008970}, 
> \{"seller":1064,"price":85810641}, \{"seller":1040,"price":99819658}, 
> \{"seller":1014,"price":11256690}, \{"seller":1098,"price":97259323}, 
> \{"seller":1011,"price":20447800}, \{"seller":1092,"price":77520938}, 
> \{"seller":1010,"price":53323687}, \{"seller":1060,"price":70032044}, 
> \{"seller":1062,"price":29076960}, \{"seller":1075,"price":19451464}, 
> \{"seller":1087,"price":27669185}, \{"seller":1009,"price":22951354}, 
> \{"seller":1065,"price":71875611}, \{"seller":1063,"price":87596779}, 
> \{"seller":1021,"price":62918185}, \{"seller":1034,"price":18472448}, 
> \{"seller":1028,"price":68556008}, \{"seller":1070,"price":92550447}]> but: 
> was <[\{"seller":1048,"price":83609648}, \{"seller":1052,"price":61788353}, 
> \{"seller":1086,"price":33744823}, \{"seller":1078,"price":19876735}, 
> \{"seller":1058,"price":50692833}, \{"seller":1044,"price":6719489}, 
> \{"seller":1096,"price":31287415}, \{"seller":1095,"price":37004879}, 
> \{"seller":1082,"price":22528654}, \{"seller":1006,"price":57288736}, 
> \{"seller":1051,"price":3967261}, \{"seller":1084,"price":6394160}, 
> \{"seller":1000,"price":34395558}, \{"seller":1020,"price":3871757}, 
> \{"seller":1007,"price":185293}, \{"seller":1031,"price":11840889}, 
> \{"seller":1080,"price":26896442}, \{"seller":1030,"price":294928}, 
> \{"seller":1066,"price":26839191}, \{"seller":1055,"price":17087173}, 
> \{"seller":1072,"price":45662210}, \{"seller":1057,"price":4568399}, 
> \{"seller":1025,"price":29008970}, \{"seller":1064,"price":85810641}, 
> \{"seller":1040,"price":99819658}, \{"seller":1014,"price":11256690}, 
> \{"seller":1098,"price":97259323}, \{"seller":1011,"price":20447800}, 
> \{"seller":1092,"price":77520938}, \{"seller":1010,"price":53323687}, 
> \{"seller":1060,"price":70032044}, \{"seller":1062,"price":29076960}, 
> \{"seller":1075,"price":19451464}, \{"seller":1087,"price":27669185}, 
> \{"seller":1009,"price":22951354}, \{"seller":1065,"price":71875611}, 
> \{"seller":1063,"price":87596779}, \{"seller":1021,"price":62918185}, 
> \{"seller":1034,"price":18472448}, \{"seller":1028,"price":68556008}, 
> \{"seller":1070,"price":92550447}]>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4572) Nexmark should serialize events in a language agnostic way

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-4572:
---
Status: Open  (was: Triage Needed)

> Nexmark should serialize events in a language agnostic way
> --
>
> Key: BEAM-4572
> URL: https://issues.apache.org/jira/browse/BEAM-4572
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: newbie, starter
>
> Nexmark encodes events using a CustomCoder by default. It would be nice to 
> have a way to support other more standard serialization formats to test 
> pipelines in multiple languages, e.g. Avro or Json.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4479) Fixed document for Coder

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-4479:
---
Status: Open  (was: Triage Needed)

> Fixed document for Coder
> 
>
> Key: BEAM-4479
> URL: https://issues.apache.org/jira/browse/BEAM-4479
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Xin Wang
>Assignee: Xin Wang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {code:java}CoderRegistry.getDefaultCoder{code} had been removed since 
> release-2.0.0, however, the document wasn't updated. This patch fixed that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4479) Fixed document for Coder

2019-05-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845299#comment-16845299
 ] 

Ismaël Mejía commented on BEAM-4479:


Is this one already fixed? can we mark it as resolved or close it?

> Fixed document for Coder
> 
>
> Key: BEAM-4479
> URL: https://issues.apache.org/jira/browse/BEAM-4479
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Xin Wang
>Assignee: Xin Wang
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> {code:java}CoderRegistry.getDefaultCoder{code} had been removed since 
> release-2.0.0, however, the document wasn't updated. This patch fixed that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery

2019-05-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-5537:
---
Status: Open  (was: Triage Needed)

> Beam Dependency Update Request: google-cloud-bigquery
> -
>
> Key: BEAM-5537
> URL: https://issues.apache.org/jira/browse/BEAM-5537
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  - 2018-10-01 19:15:02.343276 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.5.1 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:08:29.646271 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.6.0 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-15 12:09:25.995486 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.6.0 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-22 12:09:52.889923 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 0.25.0. The latest version is 1.6.0 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:07:44.834195 
> -
> Please consider upgrading the dependency google-cloud-bigquery. 
> The current version is 1.6.1. The latest version is 1.11.2 
> cc: [~markflyhigh], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >