[jira] [Work logged] (BEAM-8996) Auto-generate pipeline options documentation for FlinkRunner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8996?focusedWorklogId=362081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362081
 ]

ASF GitHub Bot logged work on BEAM-8996:


Author: ASF GitHub Bot
Created on: 21/Dec/19 06:23
Start Date: 21/Dec/19 06:23
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #10434: [BEAM-8996] 
Improvements to the Flink runner page
URL: https://github.com/apache/beam/pull/10434#issuecomment-568156779
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362081)
Time Spent: 3h 50m  (was: 3h 40m)

> Auto-generate pipeline options documentation for FlinkRunner
> 
>
> Key: BEAM-8996
> URL: https://issues.apache.org/jira/browse/BEAM-8996
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The documentation on the pipeline options on the [runner 
> page|https://beam.apache.org/documentation/runners/flink/] easily becomes 
> outdated. In order for them to stay up to date, we should auto-generate the 
> documentation from the {{FlinkPipelineOptions}} class. This should be done 
> for both Java and Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8996) Auto-generate pipeline options documentation for FlinkRunner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8996?focusedWorklogId=362080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362080
 ]

ASF GitHub Bot logged work on BEAM-8996:


Author: ASF GitHub Bot
Created on: 21/Dec/19 06:22
Start Date: 21/Dec/19 06:22
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #10434: [BEAM-8996] 
Improvements to the Flink runner page
URL: https://github.com/apache/beam/pull/10434
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362080)
Time Spent: 3h 40m  (was: 3.5h)

> Auto-generate pipeline options documentation for FlinkRunner
> 
>
> Key: BEAM-8996
> URL: https://issues.apache.org/jira/browse/BEAM-8996
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The documentation on the pipeline options on the [runner 
> page|https://beam.apache.org/documentation/runners/flink/] easily becomes 
> outdated. In order for them to stay up to date, we should auto-generate the 
> documentation from the {{FlinkPipelineOptions}} class. This should be done 
> for both Java and Python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=362019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362019
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 21/Dec/19 03:58
Start Date: 21/Dec/19 03:58
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568149841
 
 
   R: @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362019)
Time Spent: 3h 10m  (was: 3h)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9005) Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183

2019-12-20 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001616#comment-17001616
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-9005:
-

Regarding Flink and Spark VR failures, this seems to be due to environment ID 
not being set for some of the ParDo transforms in the generated runner API 
proto.

 

I set the environment ID here: 
[https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L272]

 

But I think there are other locations where Go SDK generates ParDo transforms 
that does not go through this location during translation. Due to this 
Spark/Flink fails since some ParDos do not have environment set.

 

[~lostluck] and [~danoliveira] any idea ? Is there any location where Go SDK 
generates ParDos. I suspect COGBK but not sure.

> Go SDK post-commit  failures due to https://github.com/apache/beam/pull/10183
> -
>
> Key: BEAM-9005
> URL: https://issues.apache.org/jira/browse/BEAM-9005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Looking into this.
>  
> cc: [~bhulette] [~lostluck] [~danoliveira]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8951) Stop using nose in load tests

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8951?focusedWorklogId=362017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362017
 ]

ASF GitHub Bot logged work on BEAM-8951:


Author: ASF GitHub Bot
Created on: 21/Dec/19 03:13
Start Date: 21/Dec/19 03:13
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10435: [BEAM-8951] Stop 
using nose in load tests
URL: https://github.com/apache/beam/pull/10435#issuecomment-568147273
 
 
   Didn't have time to take a look today and I am planning to be out next week. 
   @udim have converted several suites to pytest recently and may have some 
feedback here.
   
   With nose, I think we had to configure output collectors via xml files, see: 
https://github.com/apache/beam/blob/754b64b4a59f717d84032570acb8ed4cad87b227/sdks/python/scripts/run_integration_test.sh#L248
 , I have not yet learned how change output collection with pytest. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362017)
Time Spent: 1h 40m  (was: 1.5h)

> Stop using nose in load tests
> -
>
> Key: BEAM-8951
> URL: https://issues.apache.org/jira/browse/BEAM-8951
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The community is considering moving away from nose to pytest: 
> https://issues.apache.org/jira/browse/BEAM-3713. We should change the way of 
> running Python load tests: instead of being subclasses of 
> `unittest.TestCase`, they could be plain Python scripts, just like wordcount 
> examples. This will bring one additional benefit: _LOAD_TEST_ENABLED_ guard 
> will be no longer needed and could be safely removed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=362015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362015
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 21/Dec/19 03:00
Start Date: 21/Dec/19 03:00
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568146219
 
 
   R: @reuvenlax 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362015)
Time Spent: 1h 50m  (was: 1h 40m)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky

2019-12-20 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001606#comment-17001606
 ] 

Valentyn Tymofieiev commented on BEAM-8974:
---

Thanks, everyone. We can reopen if this comes up again.

> apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info
>  is flaky
> 
>
> Key: BEAM-8974
> URL: https://issues.apache.org/jira/browse/BEAM-8974
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test is failing at apache_beam/runners/worker/log_handler_test.py:110: 
> IndexError
> Added in https://github.com/apache/beam/pull/10292
> Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/]
> Console logs:
>  {noformat}
> 06:37:37 === FAILURES 
> ===
> 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info 
> 
> 06:37:37 [gw1] linux2 -- Python 2.7.12 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python
> 06:37:37
> 06:37:37 self = 
>  testMethod=test_exc_info>
> 06:37:37
> 06:37:37 def test_exc_info(self):
> 06:37:37   try:
> 06:37:37 raise ValueError('some message')
> 06:37:37   except ValueError:
> 06:37:37 _LOGGER.error('some error', exc_info=True)
> 06:37:37
> 06:37:37   self.fn_log_handler.close()
> 06:37:37
> 06:37:37 > log_entry = 
> self.test_logging_service.log_records_received[0].log_entries[0]
> 06:37:37 E IndexError: list index out of range
> 06:37:37
> 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError
> 06:37:37 - Captured stderr call 
> -
> 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> 06:37:37 -- Captured log call 
> ---
> 06:37:37 ERROR
> apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky

2019-12-20 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-8974.
---
Resolution: Fixed

> apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info
>  is flaky
> 
>
> Key: BEAM-8974
> URL: https://issues.apache.org/jira/browse/BEAM-8974
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test is failing at apache_beam/runners/worker/log_handler_test.py:110: 
> IndexError
> Added in https://github.com/apache/beam/pull/10292
> Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/]
> Console logs:
>  {noformat}
> 06:37:37 === FAILURES 
> ===
> 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info 
> 
> 06:37:37 [gw1] linux2 -- Python 2.7.12 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python
> 06:37:37
> 06:37:37 self = 
>  testMethod=test_exc_info>
> 06:37:37
> 06:37:37 def test_exc_info(self):
> 06:37:37   try:
> 06:37:37 raise ValueError('some message')
> 06:37:37   except ValueError:
> 06:37:37 _LOGGER.error('some error', exc_info=True)
> 06:37:37
> 06:37:37   self.fn_log_handler.close()
> 06:37:37
> 06:37:37 > log_entry = 
> self.test_logging_service.log_records_received[0].log_entries[0]
> 06:37:37 E IndexError: list index out of range
> 06:37:37
> 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError
> 06:37:37 - Captured stderr call 
> -
> 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> 06:37:37 -- Captured log call 
> ---
> 06:37:37 ERROR
> apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=362013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362013
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 21/Dec/19 02:52
Start Date: 21/Dec/19 02:52
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10125: [BEAM-8671] 
Added ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362013)
Time Spent: 11h 50m  (was: 11h 40m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7949) Add time-based cache threshold support in the data service of the Python SDK harness

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7949?focusedWorklogId=362011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362011
 ]

ASF GitHub Bot logged work on BEAM-7949:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:58
Start Date: 21/Dec/19 01:58
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10246: [BEAM-7949] 
Add time-based cache threshold support in the data service of the Python SDK 
harness
URL: https://github.com/apache/beam/pull/10246#issuecomment-568142289
 
 
   Thanks for your great comments, I have update the PR accordingly. ;)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362011)
Time Spent: 3h 20m  (was: 3h 10m)

> Add time-based cache threshold support in the data service of the Python SDK 
> harness
> 
>
> Key: BEAM-7949
> URL: https://issues.apache.org/jira/browse/BEAM-7949
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in the data service of 
> Python SDK harness. It should also support the time-based cache threshold. 
> This is very important, especially for streaming jobs which are sensitive to 
> the delay. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9016) Select PTransform result order is not predictable

2019-12-20 Thread Yang Zhang (Jira)
Yang Zhang created BEAM-9016:


 Summary: Select PTransform result order is not predictable
 Key: BEAM-9016
 URL: https://issues.apache.org/jira/browse/BEAM-9016
 Project: Beam
  Issue Type: Bug
  Components: beam-community
Reporter: Yang Zhang
Assignee: Aizhamal Nurmamat kyzy


pipeline.apply(Select.fieldNames("x", "y"))

pipeline.apply(Select.fieldNames("a", "b"))

The returned output order is not predictable. In the above two examples, field 
`x` may return first, while field `a` (also queries in the first place) may 
return in the second place.  

 

Shall we add `withOrderByFieldInsertionOrder` to fieldAccessDescriptor in 
Select PTransform, so that the return order is predictable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8988) apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8988?focusedWorklogId=362008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362008
 ]

ASF GitHub Bot logged work on BEAM-8988:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:44
Start Date: 21/Dec/19 01:44
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10412: [BEAM-8988] 
RangeTracker for _CustomBigQuerySource
URL: https://github.com/apache/beam/pull/10412#issuecomment-568141169
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362008)
Time Spent: 2h  (was: 1h 50m)

> apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: 
> BigQuery source must be split before being read
> ---
>
> Key: BEAM-8988
> URL: https://issues.apache.org/jira/browse/BEAM-8988
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: Critical
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Sample failure: https://builds.apache.org/job/beam_PostCommit_Python37_PR/58/
> Triggered by https://github.com/apache/beam/pull/9772.
> Stacktrace:
> {noformat}
> Pipeline 
> BeamApp-jenkins-1217231928-2108ede4_7476773b-6b06-4536-a0d5-c5fafb6c0935 
> failed in state FAILED: java.lang.RuntimeException: Error received from SDK 
> harness for instruction 96: Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 879, in process
> return self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 669, in invoke_process
> windowed_value, additional_args, additional_kwargs, output_processor)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 747, in _invoke_process_per_window
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 998, in process_outputs
> for result in results:
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 1256, in process
> yield element, self.restriction_provider.initial_restriction(element)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/iobase.py",
>  line 1518, in initial_restriction
> range_tracker = self._source.get_range_tracker(None, None)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery.py",
>  line 652, in get_range_tracker
> raise NotImplementedError('BigQuery source must be split before being 
> read')
> NotImplementedError: BigQuery source must be split before being read
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=362009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362009
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:44
Start Date: 21/Dec/19 01:44
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10356: [BEAM-7274] Infer 
a Beam Schema from a protocol buffer class.
URL: https://github.com/apache/beam/pull/10356#issuecomment-568141194
 
 
   @alexvanboxel let me know if you have more thoughts here or if this looks 
good.
   
   One more comment - once your options work is in, we should switch my use of 
field metadata over to the structured options approach.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362009)
Time Spent: 17h 50m  (was: 17h 40m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7951?focusedWorklogId=362010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362010
 ]

ASF GitHub Bot logged work on BEAM-7951:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:45
Start Date: 21/Dec/19 01:45
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9979: [BEAM-7951] 
Allow runner to configure customization WindowedValue coder.
URL: https://github.com/apache/beam/pull/9979#issuecomment-568141245
 
 
   Rebase code and squash the commits. :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362010)
Time Spent: 7h 40m  (was: 7.5h)

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=362005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362005
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:41
Start Date: 21/Dec/19 01:41
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10356: [BEAM-7274] Infer 
a Beam Schema from a protocol buffer class.
URL: https://github.com/apache/beam/pull/10356#issuecomment-568140963
 
 
   You are correct, that this requires other language to implement this parsing 
as well. However I think the visibility advantage of having a fully-represented 
proto (v.s. just embedding a bytes field in a proto) is worth that tax - and it 
shouldn't be a huge tax on Beam SDKs (it only took me about 30-40 minutes to 
write the code here)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 362005)
Time Spent: 17.5h  (was: 17h 20m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361995
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:11
Start Date: 21/Dec/19 01:11
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-568138304
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361995)
Time Spent: 1h 40m  (was: 1.5h)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361993=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361993
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:10
Start Date: 21/Dec/19 01:10
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests 
for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-568138206
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361993)
Time Spent: 10h 40m  (was: 10.5h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361994
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:10
Start Date: 21/Dec/19 01:10
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests 
for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-567742923
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361994)
Time Spent: 10h 50m  (was: 10h 40m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky

2019-12-20 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001590#comment-17001590
 ] 

Robert Bradshaw commented on BEAM-8974:
---

https://github.com/apache/beam/pull/10389 has been merged. 

This is mostly a testing issue--on a loaded machine the log writing thread 
might not start up before the test tries to close it. (It was a race with the 
pre-existing test as well but that generally did "enough work" to make the 
failure rarer.)

The only way this could affect a real worker is if it was brought up and shut 
down very quickly (as in quicker than opening up the grpc channel to get work). 
I don't think it's worth the overhead of a cherry-pick. 

> apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info
>  is flaky
> 
>
> Key: BEAM-8974
> URL: https://issues.apache.org/jira/browse/BEAM-8974
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test is failing at apache_beam/runners/worker/log_handler_test.py:110: 
> IndexError
> Added in https://github.com/apache/beam/pull/10292
> Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/]
> Console logs:
>  {noformat}
> 06:37:37 === FAILURES 
> ===
> 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info 
> 
> 06:37:37 [gw1] linux2 -- Python 2.7.12 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python
> 06:37:37
> 06:37:37 self = 
>  testMethod=test_exc_info>
> 06:37:37
> 06:37:37 def test_exc_info(self):
> 06:37:37   try:
> 06:37:37 raise ValueError('some message')
> 06:37:37   except ValueError:
> 06:37:37 _LOGGER.error('some error', exc_info=True)
> 06:37:37
> 06:37:37   self.fn_log_handler.close()
> 06:37:37
> 06:37:37 > log_entry = 
> self.test_logging_service.log_records_received[0].log_entries[0]
> 06:37:37 E IndexError: list index out of range
> 06:37:37
> 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError
> 06:37:37 - Captured stderr call 
> -
> 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> 06:37:37 -- Captured log call 
> ---
> 06:37:37 ERROR
> apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361991
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:01
Start Date: 21/Dec/19 01:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9955: [BEAM-2572] Python SDK 
S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-568137249
 
 
   Thank you all very much!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361991)
Time Spent: 5h 10m  (was: 5h)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8974?focusedWorklogId=361990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361990
 ]

ASF GitHub Bot logged work on BEAM-8974:


Author: ASF GitHub Bot
Created on: 21/Dec/19 01:00
Start Date: 21/Dec/19 01:00
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10389: [BEAM-8974] 
Wait for log messages to be processed before checking them.
URL: https://github.com/apache/beam/pull/10389
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361990)
Time Spent: 50m  (was: 40m)

> apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info
>  is flaky
> 
>
> Key: BEAM-8974
> URL: https://issues.apache.org/jira/browse/BEAM-8974
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test is failing at apache_beam/runners/worker/log_handler_test.py:110: 
> IndexError
> Added in https://github.com/apache/beam/pull/10292
> Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/]
> Console logs:
>  {noformat}
> 06:37:37 === FAILURES 
> ===
> 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info 
> 
> 06:37:37 [gw1] linux2 -- Python 2.7.12 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python
> 06:37:37
> 06:37:37 self = 
>  testMethod=test_exc_info>
> 06:37:37
> 06:37:37 def test_exc_info(self):
> 06:37:37   try:
> 06:37:37 raise ValueError('some message')
> 06:37:37   except ValueError:
> 06:37:37 _LOGGER.error('some error', exc_info=True)
> 06:37:37
> 06:37:37   self.fn_log_handler.close()
> 06:37:37
> 06:37:37 > log_entry = 
> self.test_logging_service.log_records_received[0].log_entries[0]
> 06:37:37 E IndexError: list index out of range
> 06:37:37
> 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError
> 06:37:37 - Captured stderr call 
> -
> 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> 06:37:37 -- Captured log call 
> ---
> 06:37:37 ERROR
> apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361988
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:59
Start Date: 21/Dec/19 00:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-568136984
 
 
   Thanks so much @tamera-lanham @MattMorgis - y'all went the extra mile to 
write a good feature with testable code. Lots of people have wanted this 
feature added, so I'm very grateful to you two : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361988)
Time Spent: 5h  (was: 4h 50m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2019-12-20 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001589#comment-17001589
 ] 

Ahmet Altay commented on BEAM-8944:
---

Could this be closed after the cherry pick PR 
([https://github.com/apache/beam/pull/10430]) ?

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.18.0
>
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9015) Make py37-cloud test suite for TOX instead of separate py37-gcp, and py37-aws

2019-12-20 Thread Pablo Estrada (Jira)
Pablo Estrada created BEAM-9015:
---

 Summary: Make py37-cloud test suite for TOX instead of separate 
py37-gcp, and py37-aws
 Key: BEAM-9015
 URL: https://issues.apache.org/jira/browse/BEAM-9015
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, testing
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky

2019-12-20 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001562#comment-17001562
 ] 

Ahmet Altay commented on BEAM-8974:
---

What is the next action here with respect to 2.18 release?
 * Revert the cherry pick to release branch?
 * Fix forward in the release branch? Do we know what is the fix?
 * Leave it as it its? – Is this just a test flakiness? Would this affect end 
users?

> apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info
>  is flaky
> 
>
> Key: BEAM-8974
> URL: https://issues.apache.org/jira/browse/BEAM-8974
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test is failing at apache_beam/runners/worker/log_handler_test.py:110: 
> IndexError
> Added in https://github.com/apache/beam/pull/10292
> Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/]
> Console logs:
>  {noformat}
> 06:37:37 === FAILURES 
> ===
> 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info 
> 
> 06:37:37 [gw1] linux2 -- Python 2.7.12 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python
> 06:37:37
> 06:37:37 self = 
>  testMethod=test_exc_info>
> 06:37:37
> 06:37:37 def test_exc_info(self):
> 06:37:37   try:
> 06:37:37 raise ValueError('some message')
> 06:37:37   except ValueError:
> 06:37:37 _LOGGER.error('some error', exc_info=True)
> 06:37:37
> 06:37:37   self.fn_log_handler.close()
> 06:37:37
> 06:37:37 > log_entry = 
> self.test_logging_service.log_records_received[0].log_entries[0]
> 06:37:37 E IndexError: list index out of range
> 06:37:37
> 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError
> 06:37:37 - Captured stderr call 
> -
> 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> 06:37:37 -- Captured log call 
> ---
> 06:37:37 ERROR
> apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error
> 06:37:37 Traceback (most recent call last):
> 06:37:37   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py",
>  line 104, in test_exc_info
> 06:37:37 raise ValueError('some message')
> 06:37:37 ValueError: some message
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361987
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:56
Start Date: 21/Dec/19 00:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9955: [BEAM-2572] 
Python SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361987)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8337) Add Flink job server container images to release process

2019-12-20 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001455#comment-17001455
 ] 

Ahmet Altay commented on BEAM-8337:
---

Do we have containers built?

> Add Flink job server container images to release process
> 
>
> Key: BEAM-8337
> URL: https://issues.apache.org/jira/browse/BEAM-8337
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Could be added to the release process similar to how we now publish SDK 
> worker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361986
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:55
Start Date: 21/Dec/19 00:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-568136585
 
 
   lovely!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361986)
Time Spent: 4h 40m  (was: 4.5h)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-20 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001437#comment-17001437
 ] 

Ahmet Altay commented on BEAM-8825:
---

Closing this. cherry pick PR is merged.

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-20 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-8825.
---
Resolution: Fixed

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-20 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay resolved BEAM-8882.
---
Resolution: Fixed

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8882) Allow Dataflow to automatically choose portability or not.

2019-12-20 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001365#comment-17001365
 ] 

Ahmet Altay commented on BEAM-8882:
---

Closing this. I do not see any other open PRs related to this JIRA.

> Allow Dataflow to automatically choose portability or not.
> --
>
> Key: BEAM-8882
> URL: https://issues.apache.org/jira/browse/BEAM-8882
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We would like the Dataflow service to be able to automatically choose whether 
> to run pipelines in a portable way. In order to do this, we need to provide 
> more information even if portability is not explicitly requested. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=361979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361979
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:27
Start Date: 21/Dec/19 00:27
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement 
Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#issuecomment-568023025
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361979)
Time Spent: 5h 50m  (was: 5h 40m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=361978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361978
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:26
Start Date: 21/Dec/19 00:26
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement 
Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#issuecomment-568133369
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361978)
Time Spent: 5h 40m  (was: 5.5h)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361976
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:22
Start Date: 21/Dec/19 00:22
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568132683
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361976)
Time Spent: 3h  (was: 2h 50m)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes

2019-12-20 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-9014.
---
Fix Version/s: 2.19.0
   Resolution: Fixed

> Update CachingShuffleBatchReader to record weights by size in bytes
> ---
>
> Key: BEAM-9014
> URL: https://issues.apache.org/jira/browse/BEAM-9014
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Priority: Minor
> Fix For: 2.19.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the CachingShuffleBatchReader caches based upon the number of 
> batches and not the size of those batches. This task is about updating 
> CachingShuffleBatchReader to cache based on the size of those batches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361973
 ]

ASF GitHub Bot logged work on BEAM-9013:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:11
Start Date: 21/Dec/19 00:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10445: [BEAM-9013] 
TestStream fix for DataflowRunner
URL: https://github.com/apache/beam/pull/10445#issuecomment-568131163
 
 
   Would it be possible to make some or all of the tests pipelines in 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/test_stream_test.py,
 run on Dataflow? 
   
   I guess this is tricky since we don't want it to run on all runners, just 
Dataflow and DirectRunner, but maybe you can do something like this: 
https://github.com/kamilwu/beam/blob/82db02dc68ffac074435bd0142dda900d7bfbec5/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py#L141
   
https://github.com/kamilwu/beam/blob/82db02dc68ffac074435bd0142dda900d7bfbec5/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py#L53-L66
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361973)
Time Spent: 40m  (was: 0.5h)

> Multi-output TestStream breaks the DataflowRunner
> -
>
> Key: BEAM-9013
> URL: https://issues.apache.org/jira/browse/BEAM-9013
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.17.0
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361972
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:11
Start Date: 21/Dec/19 00:11
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10447: [BEAM-8575] 
Refactor test_do_fn_with_windowing_in_finish_bundle to work with Dataflow runner
URL: https://github.com/apache/beam/pull/10447#issuecomment-568131156
 
 
   R: @y1chi
   
   Hi Yichi, this is a refactoring of https://github.com/apache/beam/pull/10145 
to make this test run on Dataflow runner. PTAL, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361972)
Time Spent: 38.5h  (was: 38h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 38.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361969
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Dec/19 00:06
Start Date: 21/Dec/19 00:06
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10447: 
[BEAM-8575] Refactor test_do_fn_with_windowing_in_finish_bundle to work with 
Dataflow runner
URL: https://github.com/apache/beam/pull/10447
 
 
   The original test assumes there is always one bundle. The assumption is not 
true on the Dataflow runner. Limit input to one single element to enforce that.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9014?focusedWorklogId=361965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361965
 ]

ASF GitHub Bot logged work on BEAM-9014:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:59
Start Date: 20/Dec/19 23:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10418: [BEAM-9014] 
CachingShuffleBatchReader use bytes to limit cache size.
URL: https://github.com/apache/beam/pull/10418
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361965)
Time Spent: 20m  (was: 10m)

> Update CachingShuffleBatchReader to record weights by size in bytes
> ---
>
> Key: BEAM-9014
> URL: https://issues.apache.org/jira/browse/BEAM-9014
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the CachingShuffleBatchReader caches based upon the number of 
> batches and not the size of those batches. This task is about updating 
> CachingShuffleBatchReader to cache based on the size of those batches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9014?focusedWorklogId=361964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361964
 ]

ASF GitHub Bot logged work on BEAM-9014:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:58
Start Date: 20/Dec/19 23:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10418: [BEAM-9014] 
CachingShuffleBatchReader use bytes to limit cache size.
URL: https://github.com/apache/beam/pull/10418#issuecomment-568129014
 
 
   Thanks for the contribution.
   
   Tyson, could you create a JIRA account as per the [contribution 
guide](https://beam.apache.org/contribute/#share-your-intent) for sharing your 
intent. Then I can add you as a contributor to the project which would allow 
you to assign JIRAs to yourself (specifically BEAM-9014 which I created for 
this change). Note that all PRs should have an accompanying JIRA associated 
with them.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361964)
Remaining Estimate: 0h
Time Spent: 10m

> Update CachingShuffleBatchReader to record weights by size in bytes
> ---
>
> Key: BEAM-9014
> URL: https://issues.apache.org/jira/browse/BEAM-9014
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the CachingShuffleBatchReader caches based upon the number of 
> batches and not the size of those batches. This task is about updating 
> CachingShuffleBatchReader to cache based on the size of those batches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes

2019-12-20 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9014:
---

 Summary: Update CachingShuffleBatchReader to record weights by 
size in bytes
 Key: BEAM-9014
 URL: https://issues.apache.org/jira/browse/BEAM-9014
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Luke Cwik


Currently the CachingShuffleBatchReader caches based upon the number of batches 
and not the size of those batches. This task is about updating 
CachingShuffleBatchReader to cache based on the size of those batches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361963
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:53
Start Date: 20/Dec/19 23:53
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10442: [BEAM-8335] 
On Unbounded Source change
URL: https://github.com/apache/beam/pull/10442#discussion_r360610858
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -75,14 +77,16 @@ def is_background_caching_job_needed(user_pipeline):
   return (has_source_to_cache(user_pipeline) and
   # Checks if it's the first time running a job from the pipeline.
   (not background_caching_job_result or
-   # Or checks if there is no valid previous job.
+   # Or checks if there is no previous job.
background_caching_job_result.state not in (
# DONE means a previous job has completed successfully and the
# cached events are still valid.
runners.runner.PipelineState.DONE,
# RUNNING means a previous job has been started and is still
# running.
-   runners.runner.PipelineState.RUNNING)))
+   runners.runner.PipelineState.RUNNING) or
+   # Or checks if we can invalidate the previous job.
+   is_unbounded_source_changed(user_pipeline)))
 
 Review comment:
   Yes, I agree. Changing it into `is_source_to_cache_changed`.
   
   I was thinking about change in a bounded source wouldn't affect cached 
unbounded sources. But it feels like that is going to split background caching 
job into 2 categories or make the instrumenting process complicated (when we 
add support to cache arbitrary source). Let's unify the source caching.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361963)
Time Spent: 50h 20m  (was: 50h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8623) Add additional message field to Provision API response for passing status endpoint

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8623?focusedWorklogId=361962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361962
 ]

ASF GitHub Bot logged work on BEAM-8623:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:50
Start Date: 20/Dec/19 23:50
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10075: [BEAM-8623] Add 
status_endpoint field to provision api ProvisionInfo
URL: https://github.com/apache/beam/pull/10075#issuecomment-568127734
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361962)
Time Spent: 3h 40m  (was: 3.5h)

> Add additional message field to Provision API response for passing status 
> endpoint
> --
>
> Key: BEAM-8623
> URL: https://issues.apache.org/jira/browse/BEAM-8623
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001300#comment-17001300
 ] 

Brian Hulette commented on BEAM-9012:
-

My motivation: we use pytype internally at Google. Some teams are already 
running pytype on code that uses beam python. Before we had type hints it just 
happily ignored the beam code, but with the change some errors are cropping up. 
You have a good point that it could be a slippery slope to promise full pytype 
support... but so far across a lot of different code this is actually the only 
issue that's come up.

> Include `-> None` on Pipeline and PipelineOptions `__init__` methods for 
> pytype compatibility
> -
>
> Key: BEAM-9012
> URL: https://issues.apache.org/jira/browse/BEAM-9012
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>
> mypy [made a decision|https://github.com/python/mypy/issues/604] to allow 
> init methods to omit {{\-> None}} return type annotations, but pytype has no 
> such feature. I think we should include {{\-> None}} annotations for pytype 
> compatibility.
> cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361961
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:39
Start Date: 20/Dec/19 23:39
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10442: 
[BEAM-8335] On Unbounded Source change
URL: https://github.com/apache/beam/pull/10442#discussion_r360608502
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -75,14 +77,16 @@ def is_background_caching_job_needed(user_pipeline):
   return (has_source_to_cache(user_pipeline) and
   # Checks if it's the first time running a job from the pipeline.
   (not background_caching_job_result or
-   # Or checks if there is no valid previous job.
+   # Or checks if there is no previous job.
background_caching_job_result.state not in (
# DONE means a previous job has completed successfully and the
# cached events are still valid.
runners.runner.PipelineState.DONE,
# RUNNING means a previous job has been started and is still
# running.
-   runners.runner.PipelineState.RUNNING)))
+   runners.runner.PipelineState.RUNNING) or
+   # Or checks if we can invalidate the previous job.
+   is_unbounded_source_changed(user_pipeline)))
 
 Review comment:
   Similar to the reason why we use has_source_to_cache() above instead of 
has_unbounded_source(), we need to see whether any of the sources to cache has 
changed. So perhaps we should change this to something like 
cache_sources_changed()
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361961)
Time Spent: 50h 10m  (was: 50h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361960
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:36
Start Date: 20/Dec/19 23:36
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568125622
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361960)
Time Spent: 2h 50m  (was: 2h 40m)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361959
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:32
Start Date: 20/Dec/19 23:32
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10442: [BEAM-8335] On 
Unbounded Source change
URL: https://github.com/apache/beam/pull/10442#issuecomment-568125058
 
 
   R: @davidyan74 
   R: @rohdesamuel 
   
   PTAL. Thanks! Merry Christmas!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361959)
Time Spent: 50h  (was: 49h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361958
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:31
Start Date: 20/Dec/19 23:31
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568124925
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361958)
Time Spent: 2h 40m  (was: 2.5h)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361956
 ]

ASF GitHub Bot logged work on BEAM-9013:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:26
Start Date: 20/Dec/19 23:26
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10445: [BEAM-9013] 
TestStream fix for DataflowRunner
URL: https://github.com/apache/beam/pull/10445#issuecomment-568124111
 
 
   > Is there a test that verifies TestStream on DataflowRunner? It seems like 
this should've been caught in a PreCommit or PostCommit
   
   I guess not. As you said, it should have been caught in a Pre/PostCommit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361956)
Time Spent: 0.5h  (was: 20m)

> Multi-output TestStream breaks the DataflowRunner
> -
>
> Key: BEAM-9013
> URL: https://issues.apache.org/jira/browse/BEAM-9013
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.17.0
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001295#comment-17001295
 ] 

Chad Dombrova commented on BEAM-9012:
-

I imagine there are going to be _lots_ of little differences between mypy and 
pytype.  I'm curious your motivation for using pytype.   Do you think we should 
aim to support both?  I'd be a bit wary of doing so, since getting mypy to pass 
can be challenging enough on its own.  I can imagine scenarios where there is 
no solution that appeases both mypy and pytype (thinking particularly of 
overloads, whose semantics seem to vary between tools).

 

> Include `-> None` on Pipeline and PipelineOptions `__init__` methods for 
> pytype compatibility
> -
>
> Key: BEAM-9012
> URL: https://issues.apache.org/jira/browse/BEAM-9012
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>
> mypy [made a decision|https://github.com/python/mypy/issues/604] to allow 
> init methods to omit {{\-> None}} return type annotations, but pytype has no 
> such feature. I think we should include {{\-> None}} annotations for pytype 
> compatibility.
> cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361953
 ]

ASF GitHub Bot logged work on BEAM-9013:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:21
Start Date: 20/Dec/19 23:21
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10445: [BEAM-9013] 
TestStream fix for DataflowRunner
URL: https://github.com/apache/beam/pull/10445#issuecomment-568123120
 
 
   Is there a test that verifies TestStream on DataflowRunner? It seems like 
this should've been caught in a PreCommit or PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361953)
Time Spent: 20m  (was: 10m)

> Multi-output TestStream breaks the DataflowRunner
> -
>
> Key: BEAM-9013
> URL: https://issues.apache.org/jira/browse/BEAM-9013
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.17.0
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361952
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:20
Start Date: 20/Dec/19 23:20
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10442: [BEAM-8335] On 
Unbounded Source change
URL: https://github.com/apache/beam/pull/10442#issuecomment-568122842
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361952)
Time Spent: 49h 50m  (was: 49h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 49h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361951
 ]

ASF GitHub Bot logged work on BEAM-9013:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:16
Start Date: 20/Dec/19 23:16
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10445: 
[BEAM-9013] TestStream fix for DataflowRunner
URL: https://github.com/apache/beam/pull/10445
 
 
   The DataflowRunner relies on the old implementation of the TestStream with 
only a single output and different watermark controlling mechanices. This adds 
the _DeprecatedSingleOutputTestStream which allows for any more development of 
the TestStream to occur in the _TestStream class without breaking backwards 
compatibility with Dataflow.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner

2019-12-20 Thread Sam Rohde (Jira)
Sam Rohde created BEAM-9013:
---

 Summary: Multi-output TestStream breaks the DataflowRunner
 Key: BEAM-9013
 URL: https://issues.apache.org/jira/browse/BEAM-9013
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: 2.17.0
Reporter: Sam Rohde
Assignee: Sam Rohde
 Fix For: 2.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8999) PGBKCVOperation does not respect timestamp combiners

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8999?focusedWorklogId=361950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361950
 ]

ASF GitHub Bot logged work on BEAM-8999:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:11
Start Date: 20/Dec/19 23:11
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10425: [BEAM-8999] Respect 
timestamp combiners in PGBKCVOperation.
URL: https://github.com/apache/beam/pull/10425#issuecomment-568120931
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361950)
Time Spent: 40m  (was: 0.5h)

> PGBKCVOperation does not respect timestamp combiners
> 
>
> Key: BEAM-8999
> URL: https://issues.apache.org/jira/browse/BEAM-8999
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We prevent lifting in the FnAPI runner in this case, but other optimizers 
> (e.g. the Greedy Fuser and Dataflow) do not, resulting in incorrect 
> timestamps. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9005) Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9005?focusedWorklogId=361949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361949
 ]

ASF GitHub Bot logged work on BEAM-9005:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:05
Start Date: 20/Dec/19 23:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10443: 
[BEAM-9005] Fixes Go formatting
URL: https://github.com/apache/beam/pull/10443
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361949)
Time Spent: 1.5h  (was: 1h 20m)

> Go SDK post-commit  failures due to https://github.com/apache/beam/pull/10183
> -
>
> Key: BEAM-9005
> URL: https://issues.apache.org/jira/browse/BEAM-9005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Looking into this.
>  
> cc: [~bhulette] [~lostluck] [~danoliveira]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9005) Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9005?focusedWorklogId=361948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361948
 ]

ASF GitHub Bot logged work on BEAM-9005:


Author: ASF GitHub Bot
Created on: 20/Dec/19 23:05
Start Date: 20/Dec/19 23:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10443: [BEAM-9005] 
Fixes Go formatting
URL: https://github.com/apache/beam/pull/10443#issuecomment-568119545
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361948)
Time Spent: 1h 20m  (was: 1h 10m)

> Go SDK post-commit  failures due to https://github.com/apache/beam/pull/10183
> -
>
> Key: BEAM-9005
> URL: https://issues.apache.org/jira/browse/BEAM-9005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Looking into this.
>  
> cc: [~bhulette] [~lostluck] [~danoliveira]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001281#comment-17001281
 ] 

Brian Hulette commented on BEAM-9012:
-

The gotcha is that pytype won't let you specify just a return type (see 
https://github.com/google/pytype/issues/480). The only workaround I've found is 
to include a full type annotation for the function, like {{#type: (int, str, 
float) -> None}}, which can be ugly, particularly for Pipeline :/

> Include `-> None` on Pipeline and PipelineOptions `__init__` methods for 
> pytype compatibility
> -
>
> Key: BEAM-9012
> URL: https://issues.apache.org/jira/browse/BEAM-9012
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>
> mypy [made a decision|https://github.com/python/mypy/issues/604] to allow 
> init methods to omit {{\-> None}} return type annotations, but pytype has no 
> such feature. I think we should include {{\-> None}} annotations for pytype 
> compatibility.
> cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001279#comment-17001279
 ] 

Chad Dombrova commented on BEAM-9012:
-

Fine by me.  Brian, if you're into the static typing thing, you may want to 
poke in over at my second PR, which is waiting on some feedback:  
[https://github.com/apache/beam/pull/10367]

There will probably be a third (and hopefully final) PR after that one to get 
the project to a point where mypy is fully passing.  We can take care of this 
issue in that final PR. 

 

> Include `-> None` on Pipeline and PipelineOptions `__init__` methods for 
> pytype compatibility
> -
>
> Key: BEAM-9012
> URL: https://issues.apache.org/jira/browse/BEAM-9012
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>
> mypy [made a decision|https://github.com/python/mypy/issues/604] to allow 
> init methods to omit {{\-> None}} return type annotations, but pytype has no 
> such feature. I think we should include {{\-> None}} annotations for pytype 
> compatibility.
> cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9012:

Description: 
mypy [made a decision|https://github.com/python/mypy/issues/604] to allow init 
methods to omit {{\-> None}} return type annotations, but pytype has no such 
feature. I think we should include {{\-> None}} annotations for pytype 
compatibility.

cc: [~chadrik]

  was:
mypy [made a decision|https://github.com/python/mypy/issues/604] to allow init 
methods to omit `-> None` return type annotations, but pytype has no such 
feature. I think we should include `-> None` annotations for pytype 
compatibility.

cc: [~chadrik]


> Include `-> None` on Pipeline and PipelineOptions `__init__` methods for 
> pytype compatibility
> -
>
> Key: BEAM-9012
> URL: https://issues.apache.org/jira/browse/BEAM-9012
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>
> mypy [made a decision|https://github.com/python/mypy/issues/604] to allow 
> init methods to omit {{\-> None}} return type annotations, but pytype has no 
> such feature. I think we should include {{\-> None}} annotations for pytype 
> compatibility.
> cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-9012:

Description: 
mypy [made a decision|https://github.com/python/mypy/issues/604] to allow init 
methods to omit `-> None` return type annotations, but pytype has no such 
feature. I think we should include `-> None` annotations for pytype 
compatibility.

cc: [~chadrik]

  was:
mypy made a decision to allow `__init__` methods to omit `-> None` return type 
annotations, but pytype has no such feature. I think we should include `-> 
None` annotations for pytype compatibility.

cc: [~chadrik]


> Include `-> None` on Pipeline and PipelineOptions `__init__` methods for 
> pytype compatibility
> -
>
> Key: BEAM-9012
> URL: https://issues.apache.org/jira/browse/BEAM-9012
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>
> mypy [made a decision|https://github.com/python/mypy/issues/604] to allow 
> init methods to omit `-> None` return type annotations, but pytype has no 
> such feature. I think we should include `-> None` annotations for pytype 
> compatibility.
> cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility

2019-12-20 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-9012:
---

 Summary: Include `-> None` on Pipeline and PipelineOptions 
`__init__` methods for pytype compatibility
 Key: BEAM-9012
 URL: https://issues.apache.org/jira/browse/BEAM-9012
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 2.19.0


mypy made a decision to allow `__init__` methods to omit `-> None` return type 
annotations, but pytype has no such feature. I think we should include `-> 
None` annotations for pytype compatibility.

cc: [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361943
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:39
Start Date: 20/Dec/19 22:39
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10404: [BEAM-8977] Resolve 
test flakiness
URL: https://github.com/apache/beam/pull/10404#issuecomment-568113752
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361943)
Time Spent: 2.5h  (was: 2h 20m)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8936) BigQuery related ITs are failing in PostCommit: quota exceeded

2019-12-20 Thread Mark Liu (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001261#comment-17001261
 ] 

Mark Liu commented on BEAM-8936:


Error details:

_Project apache-beam-testing has insufficient quota(s) to execute this workflow 
with 1 instances in region us-central1. Quota summary (required/available): 
1/12148 instances, 1/0 CPUs, 250/332935 disk GB, 0/3608 SSD disk GB, 1/31 
instance groups, 1/32 managed instance groups, 1/271 instance templates, 1/854 
in-use IP addresses._

 
We have increase the CPU quota in us-central1 from 1250 to 2000. Should relief 
the peak usage.

> BigQuery related ITs are failing in PostCommit: quota exceeded
> --
>
> Key: BEAM-8936
> URL: https://issues.apache.org/jira/browse/BEAM-8936
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Yueyang Qiu
>Assignee: Mark Liu
>Priority: Major
>  Labels: currently-failing
>
> beam_PostCommit_Java: 
> [https://builds.apache.org/job/beam_PostCommit_Java/4852/]
> beam_PostCommit_Python2: 
> [https://builds.apache.org/job/beam_PostCommit_Python2/1178|https://builds.apache.org/job/beam_PostCommit_Python2/1178/#showFailuresLink]
> beam_PostCommit_Python35: 
> [https://builds.apache.org/job/beam_PostCommit_Python35/1185]
> ...
>  
> This seems to be a GCP quota issue. Mark, could you help take a look or find 
> a owner of this bug?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8936) BigQuery related ITs are failing in PostCommit: quota exceeded

2019-12-20 Thread Mark Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-8936.

Fix Version/s: Not applicable
   Resolution: Fixed

> BigQuery related ITs are failing in PostCommit: quota exceeded
> --
>
> Key: BEAM-8936
> URL: https://issues.apache.org/jira/browse/BEAM-8936
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, test-failures
>Reporter: Yueyang Qiu
>Assignee: Mark Liu
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>
> beam_PostCommit_Java: 
> [https://builds.apache.org/job/beam_PostCommit_Java/4852/]
> beam_PostCommit_Python2: 
> [https://builds.apache.org/job/beam_PostCommit_Python2/1178|https://builds.apache.org/job/beam_PostCommit_Python2/1178/#showFailuresLink]
> beam_PostCommit_Python35: 
> [https://builds.apache.org/job/beam_PostCommit_Python35/1185]
> ...
>  
> This seems to be a GCP quota issue. Mark, could you help take a look or find 
> a owner of this bug?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361935
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:30
Start Date: 20/Dec/19 22:30
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568111413
 
 
   Run Java HadoopFormatIO Performance Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361935)
Time Spent: 1h  (was: 50m)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361939=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361939
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:30
Start Date: 20/Dec/19 22:30
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568111524
 
 
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361939)
Time Spent: 1h 40m  (was: 1.5h)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361938
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:30
Start Date: 20/Dec/19 22:30
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568111488
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361938)
Time Spent: 1.5h  (was: 1h 20m)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361937
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:30
Start Date: 20/Dec/19 22:30
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568111460
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361937)
Time Spent: 1h 20m  (was: 1h 10m)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361936
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:30
Start Date: 20/Dec/19 22:30
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568111435
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361936)
Time Spent: 1h 10m  (was: 1h)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361934
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:29
Start Date: 20/Dec/19 22:29
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-56899
 
 
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361934)
Time Spent: 1.5h  (was: 1h 20m)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361930
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:29
Start Date: 20/Dec/19 22:29
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-568111073
 
 
   Run Java HadoopFormatIO Performance Test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361930)
Time Spent: 50m  (was: 40m)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361933
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:29
Start Date: 20/Dec/19 22:29
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-56857
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361933)
Time Spent: 1h 20m  (was: 1h 10m)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361931
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:29
Start Date: 20/Dec/19 22:29
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-56800
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361931)
Time Spent: 1h  (was: 50m)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361932
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:29
Start Date: 20/Dec/19 22:29
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-56825
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361932)
Time Spent: 1h 10m  (was: 1h)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361924
 ]

ASF GitHub Bot logged work on BEAM-9000:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:28
Start Date: 20/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10441: [BEAM-9000] Java 
Test Assertions without toString for GenericJson subclasses
URL: https://github.com/apache/beam/pull/10441#issuecomment-568110802
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361924)
Time Spent: 40m  (was: 0.5h)

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361925
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:28
Start Date: 20/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568110929
 
 
   Run Java HadoopFormatIO Performance Test
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361925)
Time Spent: 1h 50m  (was: 1h 40m)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361927
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:28
Start Date: 20/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568110969
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361927)
Time Spent: 2h 10m  (was: 2h)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361928
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:28
Start Date: 20/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568110998
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361928)
Time Spent: 2h 20m  (was: 2h 10m)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361929
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:28
Start Date: 20/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568111016
 
 
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361929)
Time Spent: 2.5h  (was: 2h 20m)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361926
 ]

ASF GitHub Bot logged work on BEAM-8676:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:28
Start Date: 20/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10438: [BEAM-8676] 
sdks/java: gax and grpc upgrades
URL: https://github.com/apache/beam/pull/10438#issuecomment-568110952
 
 
   Run BigQueryIO Streaming Performance Test Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361926)
Time Spent: 2h  (was: 1h 50m)

> Beam Dependency Update Request: com.google.api:gax-grpc
> ---
>
> Key: BEAM-8676
> URL: https://issues.apache.org/jira/browse/BEAM-8676
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:38:32.410774 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:03:23.809273 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:08:16.165687 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.50.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:07:17.894174 
> -
> Please consider upgrading the dependency com.google.api:gax-grpc. 
> The current version is 1.38.0. The latest version is 1.51.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361922
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:26
Start Date: 20/Dec/19 22:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360592230
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
 ##
 @@ -64,6 +68,28 @@ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, 
RuleSet[] ruleSets) {
 plannerImpl = new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, 
ruleSets));
   }
 
+  public static RuleSet[] getZetaSqlRuleSets() {
+// TODO[BEAM-8630]: uncomment the next line once we have fully migrated to 
BeamZetaSqlCalcRel
+// return replaceBeamCalcRule(BeamRuleSets.getRuleSets());
 
 Review comment:
   I assume you'd also need to add this line up in `ZetaSQLQueryPlanner`? Or 
does that not actually do anything?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361922)
Time Spent: 2h 20m  (was: 2h 10m)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361910
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360586132
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.zetasql.AnalyzerOptions;
+import com.google.zetasql.PreparedExpression;
+import com.google.zetasql.Value;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.IntFunction;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import 
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * BeamRelNode to replace {@code Project} and {@code Filter} node based on the 
{@code ZetaSQL}
+ * expression evaluator.
+ */
+// TODO[BEAM-8630]: This class is currently a prototype and not used in 
runtime.
 
 Review comment:
   nit: this should probably be part of the comment block above
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361909
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360587972
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/WithLimitableInput.java
 ##
 @@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import org.apache.beam.sdk.annotations.Internal;
+
+/** Interface for a {@code Calc} whose input can be a {@link BeamSortRel} with 
a limit count. */
+@Internal
+public interface WithLimitableInput {
 
 Review comment:
   nit: The implementation of this interface is identical between here and 
BeamCalcRel. What you actually have is a common class `BeamCalc` on top of 
calcite's `core.Calc`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361909)
Time Spent: 1h 10m  (was: 1h)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361914
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360593644
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUtils.java
 ##
 @@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.protobuf.ByteString;
+import com.google.zetasql.ArrayType;
+import com.google.zetasql.StructType;
+import com.google.zetasql.StructType.StructField;
+import com.google.zetasql.Type;
+import com.google.zetasql.TypeFactory;
+import com.google.zetasql.Value;
+import com.google.zetasql.ZetaSQLType.TypeKind;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.LongMath;
+import org.joda.time.Instant;
+
+/** Utility methods for ZetaSQL related operations. */
+@Internal
+public final class ZetaSqlUtils {
+
+  private static final long MICROS_PER_MILLI = 1000L;
+
+  private ZetaSqlUtils() {}
+
+  // Unsupported ZetaSQL types: INT32, UINT32, UINT64, FLOAT, ENUM, PROTO, 
GEOGRAPHY
+  // TODO[BEAM-8630]: support ZetaSQL types: DATE, TIME, DATETIME
+  public static Type beamFieldTypeToZetaSqlType(FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case INT64:
+return TypeFactory.createSimpleType(TypeKind.TYPE_INT64);
+  case DECIMAL:
+return TypeFactory.createSimpleType(TypeKind.TYPE_NUMERIC);
+  case DOUBLE:
+return TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE);
+  case STRING:
+return TypeFactory.createSimpleType(TypeKind.TYPE_STRING);
+  case DATETIME:
+// TODO[BEAM-8630]: Mapping Timestamp to DATETIME results in some 
timezone/precision issues.
 
 Review comment:
   We determined the timezone issue is non-existent. I wonder if we could make 
a logical type that gave us an extra field to stuff nanoseconds without 
breaking window functions?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361914)
Time Spent: 1h 50m  (was: 1h 40m)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361913
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360590462
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.zetasql.AnalyzerOptions;
+import com.google.zetasql.PreparedExpression;
+import com.google.zetasql.Value;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.IntFunction;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import 
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * BeamRelNode to replace {@code Project} and {@code Filter} node based on the 
{@code ZetaSQL}
+ * expression evaluator.
+ */
+// TODO[BEAM-8630]: This class is currently a prototype and not used in 
runtime.
+@Internal
+public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, 
WithLimitableInput {
+
+  private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT;
+  private final SqlImplementor.Context context;
+
+  public BeamZetaSqlCalcRel(
+  RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram 
program) {
+super(cluster, traits, input, program);
+final IntFunction fn =
+i ->
+  

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361908
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10120: [BEAM-8335] 
Add a TestStreamService Python Implementation
URL: https://github.com/apache/beam/pull/10120
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361908)
Time Spent: 49h 40m  (was: 49.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 49h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361917
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360593247
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SingleRowScanConverter.java
 ##
 @@ -36,6 +42,22 @@ public boolean canConvert(ResolvedSingleRowScan zetaNode) {
 
   @Override
   public RelNode convert(ResolvedSingleRowScan zetaNode, List inputs) 
{
-return LogicalValues.createOneRow(getCluster());
+return createOneRow(getCluster());
+  }
+
+  // This function is a reimplementation of Calcite's 
LogicalValues.createOneRow() with a single
+  // line change: SqlTypeName.INTEGER replaced by SqlTypeName.BIGINT.
+  // Would like to use LogicalValues.createOneRow(), but it uses type 
SqlTypeName.INTEGER which
+  // correspond to TypeKind.TYPE_INT32 in ZetaSQL, a type not supported in 
PRODUCT_EXTERNAL mode.
+  private static LogicalValues createOneRow(RelOptCluster cluster) {
 
 Review comment:
   I don't understand this. More context please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361917)
Time Spent: 2h  (was: 1h 50m)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361916
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360592000
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
 ##
 @@ -64,6 +68,28 @@ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, 
RuleSet[] ruleSets) {
 plannerImpl = new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, 
ruleSets));
   }
 
+  public static RuleSet[] getZetaSqlRuleSets() {
+// TODO[BEAM-8630]: uncomment the next line once we have fully migrated to 
BeamZetaSqlCalcRel
+// return replaceBeamCalcRule(BeamRuleSets.getRuleSets());
+return BeamRuleSets.getRuleSets();
+  }
+
+  private static RuleSet[] replaceBeamCalcRule(RuleSet[] ruleSets) {
 
 Review comment:
   This is perfect! Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361916)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361907
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-568108785
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361907)
Time Spent: 4.5h  (was: 4h 20m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361911
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r36051
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.zetasql.AnalyzerOptions;
+import com.google.zetasql.PreparedExpression;
+import com.google.zetasql.Value;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.IntFunction;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import 
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * BeamRelNode to replace {@code Project} and {@code Filter} node based on the 
{@code ZetaSQL}
+ * expression evaluator.
+ */
+// TODO[BEAM-8630]: This class is currently a prototype and not used in 
runtime.
+@Internal
+public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, 
WithLimitableInput {
+
+  private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT;
+  private final SqlImplementor.Context context;
+
+  public BeamZetaSqlCalcRel(
+  RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram 
program) {
+super(cluster, traits, input, program);
+final IntFunction fn =
+i ->
+  

[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361918
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360591572
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.zetasql.AnalyzerOptions;
+import com.google.zetasql.PreparedExpression;
+import com.google.zetasql.Value;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.IntFunction;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import 
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * BeamRelNode to replace {@code Project} and {@code Filter} node based on the 
{@code ZetaSQL}
+ * expression evaluator.
+ */
+// TODO[BEAM-8630]: This class is currently a prototype and not used in 
runtime.
+@Internal
+public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, 
WithLimitableInput {
+
+  private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT;
+  private final SqlImplementor.Context context;
+
+  public BeamZetaSqlCalcRel(
+  RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram 
program) {
+super(cluster, traits, input, program);
+final IntFunction fn =
+i ->
+  

[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361919
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360592858
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUtilsTest.java
 ##
 @@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import static org.junit.Assert.assertEquals;
+
+import com.google.protobuf.ByteString;
+import com.google.zetasql.ArrayType;
+import com.google.zetasql.StructType;
+import com.google.zetasql.StructType.StructField;
+import com.google.zetasql.TypeFactory;
+import com.google.zetasql.Value;
+import com.google.zetasql.ZetaSQLType.TypeKind;
+import java.util.Arrays;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.Instant;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for utility methods for ZetaSQL related operations. */
+@RunWith(JUnit4.class)
+public class ZetaSqlUtilsTest {
+
+  private static final Schema TEST_INNER_SCHEMA =
+  Schema.builder().addField("i1", FieldType.INT64).addField("i2", 
FieldType.STRING).build();
+
+  private static final Schema TEST_SCHEMA =
+  Schema.builder()
+  .addField("f1", FieldType.INT64)
 
 Review comment:
   Yes, nullable fields has been a huge pain point in the past.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361919)
Time Spent: 2h 10m  (was: 2h)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361915
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360591277
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.zetasql.AnalyzerOptions;
+import com.google.zetasql.PreparedExpression;
+import com.google.zetasql.Value;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.IntFunction;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import 
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * BeamRelNode to replace {@code Project} and {@code Filter} node based on the 
{@code ZetaSQL}
+ * expression evaluator.
+ */
+// TODO[BEAM-8630]: This class is currently a prototype and not used in 
runtime.
+@Internal
+public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, 
WithLimitableInput {
+
+  private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT;
+  private final SqlImplementor.Context context;
+
+  public BeamZetaSqlCalcRel(
+  RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram 
program) {
+super(cluster, traits, input, program);
+final IntFunction fn =
+i ->
+  

[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361920
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360592230
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java
 ##
 @@ -64,6 +68,28 @@ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, 
RuleSet[] ruleSets) {
 plannerImpl = new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, 
ruleSets));
   }
 
+  public static RuleSet[] getZetaSqlRuleSets() {
+// TODO[BEAM-8630]: uncomment the next line once we have fully migrated to 
BeamZetaSqlCalcRel
+// return replaceBeamCalcRule(BeamRuleSets.getRuleSets());
 
 Review comment:
   I assume you'd also need to add this line up in `ZetaSQLQueryPlanner`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361920)
Time Spent: 2h 10m  (was: 2h)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361912
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:20
Start Date: 20/Dec/19 22:20
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r360590920
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java
 ##
 @@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql;
+
+import com.google.zetasql.AnalyzerOptions;
+import com.google.zetasql.PreparedExpression;
+import com.google.zetasql.Value;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.IntFunction;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import 
org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * BeamRelNode to replace {@code Project} and {@code Filter} node based on the 
{@code ZetaSQL}
+ * expression evaluator.
+ */
+// TODO[BEAM-8630]: This class is currently a prototype and not used in 
runtime.
+@Internal
+public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, 
WithLimitableInput {
+
+  private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT;
+  private final SqlImplementor.Context context;
+
+  public BeamZetaSqlCalcRel(
+  RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram 
program) {
+super(cluster, traits, input, program);
+final IntFunction fn =
+i ->
+  

[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361906
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:19
Start Date: 20/Dec/19 22:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-568108754
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361906)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361904
 ]

ASF GitHub Bot logged work on BEAM-9010:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:14
Start Date: 20/Dec/19 22:14
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper 
TableRow size calculation via TableRowJsonCoder
URL: https://github.com/apache/beam/pull/10444#issuecomment-568107404
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361904)
Time Spent: 50m  (was: 40m)

> BigQuery TableRow's size is toString().length() ?
> -
>
> Key: BEAM-9010
> URL: https://issues.apache.org/jira/browse/BEAM-9010
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Attachments: TableRowJsonCoder_behavior_remains_same.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The following tests failed when I tried to upgrade google-http-client 1.34.0 
> from 1.28.0:
> {noformat}
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll
> {noformat}
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink]
> h3. Reason of the test failures
> [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43]
>  and 
> [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758]
>  rely on {{TableRow.toString().length()}} to calculate the size. Example:
> {code:java}
>   dataSize += row.toString().length();
>   if (dataSize >= maxRowBatchSize
>   || rows.size() >= maxRowsPerBatch
>   || i == rowsToPublish.size() - 1) {
> {code}
> However, with [google-http-client's 
> PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218],
>  the GenericData.toString output has changed since v1.29.0.
> In old google-http-client 1.28.0, an example row's toString returned:
> {noformat}
> {f=[{v=foo}, {v=1234}]}
> {noformat}
> In new google-http-client 1.29.0 and higher, the same row's toString returns:
> {noformat}
> GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, 
> GenericData{classInfo=[v], {v=1234}}]}}
> {noformat}
> h1. Question:
> Is this right thing to rely on {{toString().length()}} in the BigQuery 
> classes?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361903
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 20/Dec/19 22:12
Start Date: 20/Dec/19 22:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-568106721
 
 
   Looks like errors unrelated to the change. Let me clean up the GCP project 
that we use for testing
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361903)
Time Spent: 4h 10m  (was: 4h)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >