date:20200128

[jira] [Updated] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9204:
---
Description: This is an issue because that the split is called multiple 
times and in this cas it will produce repeated work.  (was: This is an issue 
because it is common if split is called multiple times work this will produce 
repeated work.)

> HBase SDF @SplitRestriction does not take the range input into account to 
> restrict splits
> -
>
> Key: BEAM-9204
> URL: https://issues.apache.org/jira/browse/BEAM-9204
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an issue because that the split is called multiple times and in this 
> cas it will produce repeated work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378097
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 08:25
Start Date: 28/Jan/20 08:25
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579133146
 
 
   @suztomo maybe you know a way I can run the linkagechecker analysis in the 
full set of Beam modules? I think is more scalable to have a task for that that 
we invoke during PRs to validate that no regressions are included as suggested 
by Luke. (I can do that in Maven but my gradle-fu is still not good enough).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378097)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread Jira

Ismaël Mejía created BEAM-9204:
--

 Summary: HBase SDF @SplitRestriction does not take the range input 
into account to restrict splits
 Key: BEAM-9204
 URL: https://issues.apache.org/jira/browse/BEAM-9204
 Project: Beam
  Issue Type: Bug
  Components: io-java-hbase
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7810) Allow ValueProvider arguments to ReadFromDatastore

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7810?focusedWorklogId=378091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378091
 ]

ASF GitHub Bot logged work on BEAM-7810:


Author: ASF GitHub Bot
Created on: 28/Jan/20 08:08
Start Date: 28/Jan/20 08:08
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10683: [BEAM-7810] Added 
ValueProvider support for Datastore query namespaces
URL: https://github.com/apache/beam/pull/10683#issuecomment-579127724
 
 
   Cheers for opening and fixing that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378091)
Time Spent: 1h 10m  (was: 1h)

> Allow ValueProvider arguments to ReadFromDatastore
> --
>
> Key: BEAM-7810
> URL: https://issues.apache.org/jira/browse/BEAM-7810
> Project: Beam
>  Issue Type: New Feature
>  Components: io-py-gcp
>Reporter: Udi Meiri
>Assignee: Elias Djurfeldt
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> From: 
> https://stackoverflow.com/questions/56748893/trying-to-achieve-runtime-value-of-namespace-of-datastore-in-dataflow-template



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9204:
---
Status: Open  (was: Triage Needed)

> HBase SDF @SplitRestriction does not take the range input into account to 
> restrict splits
> -
>
> Key: BEAM-9204
> URL: https://issues.apache.org/jira/browse/BEAM-9204
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> This is an issue because it is common if split is called multiple times work 
> this will produce repeated work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9204:
---
Description: This is an issue because it is common if split is called 
multiple times work this will produce repeated work.

> HBase SDF @SplitRestriction does not take the range input into account to 
> restrict splits
> -
>
> Key: BEAM-9204
> URL: https://issues.apache.org/jira/browse/BEAM-9204
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> This is an issue because it is common if split is called multiple times work 
> this will produce repeated work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-4735) Make HBaseIO.read() based on SDF

2020-01-28 Thread Jira



[ 
https://issues.apache.org/jira/browse/BEAM-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024990#comment-17024990
 ] 

Ismaël Mejía commented on BEAM-4735:


Oh interesting finding! Just filled BEAM-9204 to tackle it.

> Make HBaseIO.read() based on SDF
> 
>
> Key: BEAM-4735
> URL: https://issues.apache.org/jira/browse/BEAM-4735
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>
> BEAM-4020 introduces HBaseIO reads based on SDF. So far the read() method 
> still uses the Source based API for two reasons:
> 1. Most distributed runners don't supports Bounded SDF today.
> 2. SDF does not support Dynamic Work Rebalancing but the Source API of HBase 
> already supports it so changing it means losing some functionality.
> Once there is improvements in both (1) and (2) we should consider moving the 
> main read() function to use the SDF API and remove the Source based 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9204?focusedWorklogId=378121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378121
 ]

ASF GitHub Bot logged work on BEAM-9204:


Author: ASF GitHub Bot
Created on: 28/Jan/20 09:45
Start Date: 28/Jan/20 09:45
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10700: [BEAM-9204] 
Fix HBase SplitRestriction to be based on provided Range
URL: https://github.com/apache/beam/pull/10700
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378121)
Remaining Estimate: 0h
Time Spent: 10m

> HBase SDF @SplitRestriction does not take the range input into account to 
> restrict splits
> -
>
> Key: BEAM-9204
> URL: https://issues.apache.org/jira/browse/BEAM-9204
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is an issue because it is common if split is called multiple times work 
> this will produce repeated work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378095
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 08:23
Start Date: 28/Jan/20 08:23
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579132447
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378095)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378096
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 08:23
Start Date: 28/Jan/20 08:23
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579132447
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378096)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378468
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:06
Start Date: 28/Jan/20 21:06
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10701: [BEAM-9188] 
CassandraIO split performance improvement - cache size of the table
URL: https://github.com/apache/beam/pull/10701#issuecomment-579455635
 
 
   Retest it please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378468)
Time Spent: 40m  (was: 0.5h)

> Improving speed of splitting for Custom Sources
> ---
>
> Key: BEAM-9188
> URL: https://issues.apache.org/jira/browse/BEAM-9188
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> At this moment Custom Source in being split and serialized in sequence. If 
> there are many splits, it takes time to process all splits. 
>  
> Example: it takes 2s to calculate size and serialize CassandraSource due to 
> connection setup and teardown. With 100+ splits, it's a lot of time spent in 
> 1 worker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8756) Beam Dependency Update Request: com.google.cloud:google-cloud-core

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8756?focusedWorklogId=378473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378473
 ]

ASF GitHub Bot logged work on BEAM-8756:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:13
Start Date: 28/Jan/20 21:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10674: [BEAM-8756] 
Google cloud client libraries to use 2019 versions
URL: https://github.com/apache/beam/pull/10674
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378473)
Time Spent: 3h 10m  (was: 3h)

> Beam Dependency Update Request: com.google.cloud:google-cloud-core
> --
>
> Key: BEAM-8756
> URL: https://issues.apache.org/jira/browse/BEAM-8756
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>  - 2019-11-19 21:05:22.063293 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.91.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:10:54.007619 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.91.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:10:01.481594 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.91.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-23 12:10:02.830621 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.92.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:05:33.624755 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.92.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:09:08.158708 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.92.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-13 12:08:46.788844 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.92.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-20 12:08:23.305323 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-core. 
> The current version is 1.61.0. The latest version is 1.92.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-27 12:09:28.216096 
> -
> Please consider upgrading the

[jira] [Work logged] (BEAM-9125) Update BigQuery Storage API documentation

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9125?focusedWorklogId=378477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378477
 ]

ASF GitHub Bot logged work on BEAM-9125:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:21
Start Date: 28/Jan/20 21:21
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10606: [BEAM-9125] Update BQ 
Storage API documentation
URL: https://github.com/apache/beam/pull/10606#issuecomment-579461992
 
 
   R: @rosetn @soyrice 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378477)
Time Spent: 20m  (was: 10m)

> Update BigQuery Storage API documentation
> -
>
> Key: BEAM-9125
> URL: https://issues.apache.org/jira/browse/BEAM-9125
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kenneth Jung
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some of the [documented limitations|#storage-api]] for using the BigQuery 
> Storage API in Beam are no longer accurate. In particular:
>  * The BigQuery Storage API no longer needs to be enabled manually
>  * Dynamic work rebalancing has been supported in Beam since 2.15.0
>  * The sample code should use .withSelectedFields(...) rather than 
> .withReadOptions(...).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8590) Python typehints: native types: consider bare container types as containing Any

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8590?focusedWorklogId=378482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378482
 ]

ASF GitHub Bot logged work on BEAM-8590:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:28
Start Date: 28/Jan/20 21:28
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10042: [BEAM-8590] Support 
unsubscripted native types
URL: https://github.com/apache/beam/pull/10042#issuecomment-579464714
 
 
   R: @robertwb - LMK if you have bandwidth for this
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378482)
Time Spent: 50m  (was: 40m)

> Python typehints: native types: consider bare container types as containing 
> Any
> ---
>
> Key: BEAM-8590
> URL: https://issues.apache.org/jira/browse/BEAM-8590
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is for convert_to_beam_type:
> For example, process(element: List) is the same as process(element: 
> List[Any]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8627) Migrate bigtable-client-core to 1.12.1

2020-01-28 Thread Tomo Suzuki (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki closed BEAM-8627.
-

> Migrate bigtable-client-core to 1.12.1
> --
>
> Key: BEAM-8627
> URL: https://issues.apache.org/jira/browse/BEAM-8627
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: Not applicable
>
> Attachments: rsP0SotuxeR.png
>
>
> Beam currently uses Migrate bigtable-client-core to 1.8.0. The latest is 
> 1.12.1
> [https://search.maven.org/artifact/com.google.cloud.bigtable/bigtable-client-core/1.12.1/jar]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2020-01-28 Thread Tomo Suzuki (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025456#comment-17025456
 ] 

Tomo Suzuki commented on BEAM-8691:
---

Trying 1.13.0. Failed


{noformat}
> Task :checkJavaLinkage
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Exception in thread "main" 
org.eclipse.aether.collection.DependencyCollectionException: Failed to collect 
dependencies for 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT 
(compile)
at 
org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.collectDependencies(DefaultDependencyCollector.java:295)
at 
org.eclipse.aether.internal.impl.DefaultRepositorySystem.collectDependencies(DefaultRepositorySystem.java:284)
at 
com.google.cloud.tools.opensource.dependencies.DependencyGraphBuilder.resolveCompileTimeDependencies(DependencyGraphBuilder.java:188)
at 
com.google.cloud.tools.opensource.dependencies.DependencyGraphBuilder.getStaticLinkageCheckDependencyGraph(DependencyGraphBuilder.java:229)
at 
com.google.cloud.tools.opensource.classpath.ClassPathBuilder.resolve(ClassPathBuilder.java:71)
at 
com.google.cloud.tools.opensource.classpath.LinkageCheckerMain.main(LinkageCheckerMain.java:69)
Caused by: org.eclipse.aether.collection.UnsolvableVersionConflictException: 
Could not resolve version conflict among 
[org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.api:gax-grpc:jar:1.52.0 -> io.grpc:grpc-netty-shaded:jar:1.25.0 -> 
io.grpc:grpc-core:jar:[1.25.0,1.25.0], 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.api:gax-grpc:jar:1.52.0 -> io.grpc:grpc-alts:jar:1.25.0 -> 
io.grpc:grpc-core:jar:[1.25.0,1.25.0], 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.cloud:google-cloud-bigquerystorage:jar:0.120.1-beta -> 
io.grpc:grpc-core:jar:1.25.0, 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0 -> 
com.google.cloud:google-cloud-bigtable:jar:1.9.1 -> 
io.grpc:grpc-core:jar:1.26.0, 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0 -> 
io.grpc:grpc-core:jar:1.26.0, 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0 -> 
io.grpc:grpc-grpclb:jar:1.26.0 -> io.grpc:grpc-core:jar:[1.26.0,1.26.0], 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.cloud.bigtable:bigtable-client-core:jar:1.13.0 -> 
io.opencensus:opencensus-contrib-grpc-util:jar:0.24.0 -> 
io.grpc:grpc-core:jar:1.22.1, 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
com.google.cloud:google-cloud-core-grpc:jar:1.92.2 -> 
io.grpc:grpc-core:jar:1.26.0, 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
io.grpc:grpc-all:jar:1.25.0 -> io.grpc:grpc-core:jar:[1.25.0,1.25.0], 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
io.grpc:grpc-all:jar:1.25.0 -> io.grpc:grpc-okhttp:jar:1.25.0 -> 
io.grpc:grpc-core:jar:[1.25.0,1.25.0], 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
io.grpc:grpc-all:jar:1.25.0 -> io.grpc:grpc-testing:jar:1.25.0 -> 
io.grpc:grpc-core:jar:[1.25.0,1.25.0], 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
io.grpc:grpc-core:jar:1.25.0, 
org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.20.0-SNAPSHOT -> 
io.grpc:grpc-netty:jar:1.25.0 -> io.grpc:grpc-core:jar:[1.25.0,1.25.0]]
at 
org.eclipse.aether.util.graph.transformer.NearestVersionSelector.newFailure(NearestVersionSelector.java:159)
at 
org.eclipse.aether.util.graph.transformer.NearestVersionSelector.backtrack(NearestVersionSelector.java:120)
at 
org.eclipse.aether.util.graph.transformer.NearestVersionSelector.selectVersion(NearestVersionSelector.java:93)
at 
org.eclipse.aether.util.graph.transformer.ConflictResolver.transformGraph(ConflictResolver.java:180)
at 
org.eclipse.aether.util.graph.transformer.ChainedDependencyGraphTransformer.transformGraph(ChainedDependencyGraphTransformer.java:80)
at 
org.eclipse.aether.internal.impl.collect.DefaultDependencyCollector.collectDependencies(DefaultDependencyCollector.java:273)
... 5 more
{noformat}




> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL:

[jira] [Work logged] (BEAM-9178) Support ZetaSQL TIMESTAMP functions in BeamSQL

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9178?focusedWorklogId=378496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378496
 ]

ASF GitHub Bot logged work on BEAM-9178:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:52
Start Date: 28/Jan/20 21:52
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #10634: [BEAM-9178] 
Support all ZetaSQL TIMESTAMP functions
URL: https://github.com/apache/beam/pull/10634#issuecomment-579474737
 
 
   Factored out the controversial parts that you mentioned. Now we don't 
support extracting or truncating "week with weekdays" part (I think this is 
fine).
   
   PR is ready for review now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378496)
Time Spent: 20m  (was: 10m)

> Support ZetaSQL TIMESTAMP functions in BeamSQL
> --
>
> Key: BEAM-9178
> URL: https://issues.apache.org/jira/browse/BEAM-9178
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support *all* TIMESTAMP functions defined in ZetaSQL (BigQuery Standard SQL). 
> See the full list of functions below:
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378497
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:53
Start Date: 28/Jan/20 21:53
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10685: [BEAM-9188] 
Dataflow's WorkerCustomSources improvement - parallelize creation of Derived 
Sources (splitting)
URL: https://github.com/apache/beam/pull/10685#issuecomment-579475332
 
 
   Would you mind closing this PR given that we decided to go with 
https://github.com/apache/beam/pull/10701?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378497)
Time Spent: 50m  (was: 40m)

> Improving speed of splitting for Custom Sources
> ---
>
> Key: BEAM-9188
> URL: https://issues.apache.org/jira/browse/BEAM-9188
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> At this moment Custom Source in being split and serialized in sequence. If 
> there are many splits, it takes time to process all splits. 
>  
> Example: it takes 2s to calculate size and serialize CassandraSource due to 
> connection setup and teardown. With 100+ splits, it's a lot of time spent in 
> 1 worker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8298) Implement state caching for side inputs

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8298?focusedWorklogId=378512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378512
 ]

ASF GitHub Bot logged work on BEAM-8298:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:08
Start Date: 28/Jan/20 22:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10705: [BEAM-8298] 
Implement side input caching.
URL: https://github.com/apache/beam/pull/10705#discussion_r372080828
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -825,43 +833,92 @@ def extend(self,
 
   def clear(self, state_key, is_cached=False):
 # type: (beam_fn_api_pb2.StateKey, bool) -> _Future
-if self._should_be_cached(is_cached):
+cache_token = self._get_cache_token(state_key, is_cached)
+if cache_token:
   cache_key = self._convert_to_cache_key(state_key)
-  self._state_cache.clear(cache_key, self._context.cache_token)
+  self._state_cache.clear(cache_key, cache_token)
 return self._underlying.clear(state_key)
 
   def done(self):
 # type: () -> None
 self._underlying.done()
 
-  def _materialize_iter(self,
-state_key,  # type: beam_fn_api_pb2.StateKey
-coder  # type: coder_impl.CoderImpl
-   ):
+  def _lazy_iterator(
+  self,
+  state_key,  # type: beam_fn_api_pb2.StateKey
+  coder,  # type: coder_impl.CoderImpl
+  continuation_token=None  # type: Optional[bytes]
+):
 # type: (...) -> Iterator[Any]
 """Materializes the state lazily, one element at a time.
:return A generator which returns the next element if advanced.
 """
-continuation_token = None
 while True:
-  data, continuation_token = \
-  self._underlying.get_raw(state_key, continuation_token)
+  data, continuation_token = (
+  self._underlying.get_raw(state_key, continuation_token))
   input_stream = coder_impl.create_InputStream(data)
   while input_stream.size() > 0:
 yield coder.decode_from_stream(input_stream, True)
   if not continuation_token:
 break
 
-  def _should_be_cached(self, request_is_cached):
-return (self._state_cache.is_cache_enabled() and
-request_is_cached and
-self._context.cache_token)
+  def _get_cache_token(self, state_key, request_is_cached):
+if not self._state_cache.is_cache_enabled():
+  return None
+elif state_key.HasField('bag_user_state'):
+  if request_is_cached and self._context.user_state_cache_token:
+return self._context.user_state_cache_token
+  else:
+return self._context.bundle_cache_token
+elif state_key.WhichOneof('type').endswith('_side_input'):
+  side_input = getattr(state_key, state_key.WhichOneof('type'))
+  return self._context.side_input_cache_tokens.get(
+(side_input.transform_id, side_input.side_input_id),
+self._context.bundle_cache_token)
+
+  def _partially_cached_iterable(
+  self,
+  state_key,  # type: beam_fn_api_pb2.StateKey
+  coder  # type: coder_impl.CoderImpl
+):
+# type: (...) -> Iterable[Any]
+"""Materialized the first page of data, concatinated with a lazy iterable
+of the rest, if any.
+"""
+data, continuation_token = (
+self._underlying.get_raw(state_key, None))
+head = []
+input_stream = coder_impl.create_InputStream(data)
+while input_stream.size() > 0:
+  head.append(coder.decode_from_stream(input_stream, True))
+
+if continuation_token is None:
+  return head
+else:
+  def iter_func():
+for item in head:
+  yield item
+for item in self._lazy_iterator(state_key, coder, continuation_token):
+  yield item
+  return _IterableFromIterator(iter_func)
 
   @staticmethod
   def _convert_to_cache_key(state_key):
 return state_key.SerializeToString()
 
 
+class _IterableFromIterator(object):
+  """Wraps an iterator as an iterable."""
+  def __init__(self, iter_func):
+self._iter_func = iter_func
+  def __iter__(self):
+return iter_func()
 
 Review comment:
   ```suggestion
   return iter_func
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378512)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement state caching for side inputs
> ---
>
> Key: BEAM-8298
> URL: https://issues.apache.org/jira/browse/BEAM-8298
> Project: Beam
>

[jira] [Work logged] (BEAM-8298) Implement state caching for side inputs

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8298?focusedWorklogId=378511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378511
 ]

ASF GitHub Bot logged work on BEAM-8298:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:08
Start Date: 28/Jan/20 22:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10705: [BEAM-8298] 
Implement side input caching.
URL: https://github.com/apache/beam/pull/10705#discussion_r372074511
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -825,43 +833,92 @@ def extend(self,
 
   def clear(self, state_key, is_cached=False):
 # type: (beam_fn_api_pb2.StateKey, bool) -> _Future
-if self._should_be_cached(is_cached):
+cache_token = self._get_cache_token(state_key, is_cached)
+if cache_token:
   cache_key = self._convert_to_cache_key(state_key)
-  self._state_cache.clear(cache_key, self._context.cache_token)
+  self._state_cache.clear(cache_key, cache_token)
 return self._underlying.clear(state_key)
 
   def done(self):
 # type: () -> None
 self._underlying.done()
 
-  def _materialize_iter(self,
-state_key,  # type: beam_fn_api_pb2.StateKey
-coder  # type: coder_impl.CoderImpl
-   ):
+  def _lazy_iterator(
+  self,
+  state_key,  # type: beam_fn_api_pb2.StateKey
+  coder,  # type: coder_impl.CoderImpl
+  continuation_token=None  # type: Optional[bytes]
+):
 # type: (...) -> Iterator[Any]
 """Materializes the state lazily, one element at a time.
:return A generator which returns the next element if advanced.
 """
-continuation_token = None
 while True:
-  data, continuation_token = \
-  self._underlying.get_raw(state_key, continuation_token)
+  data, continuation_token = (
+  self._underlying.get_raw(state_key, continuation_token))
   input_stream = coder_impl.create_InputStream(data)
   while input_stream.size() > 0:
 yield coder.decode_from_stream(input_stream, True)
   if not continuation_token:
 break
 
-  def _should_be_cached(self, request_is_cached):
-return (self._state_cache.is_cache_enabled() and
-request_is_cached and
-self._context.cache_token)
+  def _get_cache_token(self, state_key, request_is_cached):
+if not self._state_cache.is_cache_enabled():
+  return None
+elif state_key.HasField('bag_user_state'):
+  if request_is_cached and self._context.user_state_cache_token:
+return self._context.user_state_cache_token
+  else:
+return self._context.bundle_cache_token
+elif state_key.WhichOneof('type').endswith('_side_input'):
+  side_input = getattr(state_key, state_key.WhichOneof('type'))
+  return self._context.side_input_cache_tokens.get(
+(side_input.transform_id, side_input.side_input_id),
+self._context.bundle_cache_token)
+
+  def _partially_cached_iterable(
+  self,
+  state_key,  # type: beam_fn_api_pb2.StateKey
+  coder  # type: coder_impl.CoderImpl
+):
+# type: (...) -> Iterable[Any]
+"""Materialized the first page of data, concatinated with a lazy iterable
 
 Review comment:
   ```suggestion
   """Materialized the first page of data, concatenated with a lazy iterable
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378511)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement state caching for side inputs
> ---
>
> Key: BEAM-8298
> URL: https://issues.apache.org/jira/browse/BEAM-8298
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Caching is currently only implemented for user state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378518
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:13
Start Date: 28/Jan/20 22:13
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10701: [BEAM-9188] 
CassandraIO split performance improvement - cache size of the table
URL: https://github.com/apache/beam/pull/10701#issuecomment-579484771
 
 
   Retest it please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378518)
Time Spent: 1h 10m  (was: 1h)

> Improving speed of splitting for Custom Sources
> ---
>
> Key: BEAM-9188
> URL: https://issues.apache.org/jira/browse/BEAM-9188
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> At this moment Custom Source in being split and serialized in sequence. If 
> there are many splits, it takes time to process all splits. 
>  
> Example: it takes 2s to calculate size and serialize CassandraSource due to 
> connection setup and teardown. With 100+ splits, it's a lot of time spent in 
> 1 worker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378547
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:57
Start Date: 28/Jan/20 22:57
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372105866
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378548
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:58
Start Date: 28/Jan/20 22:58
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372106201
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/test/java/org/apache/beam/sdk/extensions/arrow/ArrowConversionTest.java
 ##
 @@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static java.util.Arrays.asList;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.Matchers.equalTo;
+
+import java.util.ArrayList;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+import org.apache.arrow.vector.BitVector;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.FixedSizeBinaryVector;
+import org.apache.arrow.vector.Float8Vector;
+import org.apache.arrow.vector.IntVector;
+import org.apache.arrow.vector.TimeStampMicroTZVector;
+import org.apache.arrow.vector.TimeStampMilliTZVector;
+import org.apache.arrow.vector.VarCharVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.complex.ListVector;
+import org.apache.arrow.vector.types.FloatingPointPrecision;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.hamcrest.collection.IsIterableContainingInOrder;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class ArrowConversionTest {
+
+  private BufferAllocator allocator;
+
+  @Before
+  public void init() {
+allocator = new RootAllocator(Long.MAX_VALUE);
+  }
+
+  @After
+  public void teardown() {
+allocator.close();
+  }
+
+  @Test
+  public void toBeamSchema_convertsSimpleArrowSchema() {
+Schema expected =
+Schema.of(Field.of("int8", FieldType.BYTE), Field.of("int16", 
FieldType.INT16));
+
+org.apache.arrow.vector.types.pojo.Schema arrowSchema =
+new org.apache.arrow.vector.types.pojo.Schema(
+ImmutableList.of(
+field("int8", new ArrowType.Int(8, true)),
+field("int16", new ArrowType.Int(16, true;
+
+assertThat(ArrowConversion.toBeamSchema(arrowSchema), equalTo(expected));
+  }
+
+  @Test
+  public void rowIterator() {
+org.apache.arrow.vector.types.pojo.Schema schema =
+new org.apache.arrow.vector.types.pojo.Schema(
+asList(
+field("int32", new ArrowType.Int(32, true)),
+field("float64", new 
ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)),
+field("string", new ArrowType.Utf8()),
+field("timestampMicroUTC", new 
ArrowType.Timestamp(TimeUnit.MICROSECOND, "UTC")),
+field("timestampMilliUTC", new 
ArrowType.Timestamp(TimeUnit.MILLISECOND, "UTC")),
+field(
+"int32_list",
+new ArrowType.List(),
+field("int32s", new ArrowType.Int(32, true))),
+field("boolean", new ArrowType.Bool()),
+field("fixed_size_binary", new ArrowType.FixedSizeBinary(3;
+
+Schema beamSchema = ArrowConversion.toBeamSchema(schema);
+
+VectorSchemaRoot expectedSchemaRoot =

[jira] [Resolved] (BEAM-8627) Migrate bigtable-client-core to 1.12.1

2020-01-28 Thread Tomo Suzuki (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki resolved BEAM-8627.
---
Fix Version/s: Not applicable
   Resolution: Duplicate

> Migrate bigtable-client-core to 1.12.1
> --
>
> Key: BEAM-8627
> URL: https://issues.apache.org/jira/browse/BEAM-8627
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp, runner-dataflow
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: Not applicable
>
> Attachments: rsP0SotuxeR.png
>
>
> Beam currently uses Migrate bigtable-client-core to 1.8.0. The latest is 
> 1.12.1
> [https://search.maven.org/artifact/com.google.cloud.bigtable/bigtable-client-core/1.12.1/jar]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9200) Portable job jar postcommits failing

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9200?focusedWorklogId=378520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378520
 ]

ASF GitHub Bot logged work on BEAM-9200:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:14
Start Date: 28/Jan/20 22:14
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10703: [BEAM-9200] fix 
portable jar test version property
URL: https://github.com/apache/beam/pull/10703#issuecomment-579485140
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378520)
Time Spent: 20m  (was: 10m)

> Portable job jar postcommits failing
> 
>
> Key: BEAM-9200
> URL: https://issues.apache.org/jira/browse/BEAM-9200
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink, portability-spark
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> 15:25:58 Execution failed for task 
> ':runners:spark:job-server:testJavaJarCreatorPy37'.
> 15:25:58 > Could not get unknown property 'python_sdk_version' for project 
> ':runners:spark:job-server' of type org.gradle.api.Project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9200) Portable job jar postcommits failing

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9200?focusedWorklogId=378521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378521
 ]

ASF GitHub Bot logged work on BEAM-9200:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:14
Start Date: 28/Jan/20 22:14
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10703: [BEAM-9200] fix 
portable jar test version property
URL: https://github.com/apache/beam/pull/10703#issuecomment-579485247
 
 
   Run PortableJar_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378521)
Time Spent: 0.5h  (was: 20m)

> Portable job jar postcommits failing
> 
>
> Key: BEAM-9200
> URL: https://issues.apache.org/jira/browse/BEAM-9200
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink, portability-spark
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 15:25:58 Execution failed for task 
> ':runners:spark:job-server:testJavaJarCreatorPy37'.
> 15:25:58 > Could not get unknown property 'python_sdk_version' for project 
> ':runners:spark:job-server' of type org.gradle.api.Project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9200) Portable job jar postcommits failing

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9200?focusedWorklogId=378522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378522
 ]

ASF GitHub Bot logged work on BEAM-9200:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:16
Start Date: 28/Jan/20 22:16
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #10703: [BEAM-9200] fix 
portable jar test version property
URL: https://github.com/apache/beam/pull/10703#issuecomment-579486529
 
 
   LGTM, thanks for fixing it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378522)
Time Spent: 40m  (was: 0.5h)

> Portable job jar postcommits failing
> 
>
> Key: BEAM-9200
> URL: https://issues.apache.org/jira/browse/BEAM-9200
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink, portability-spark
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 15:25:58 Execution failed for task 
> ':runners:spark:job-server:testJavaJarCreatorPy37'.
> 15:25:58 > Could not get unknown property 'python_sdk_version' for project 
> ':runners:spark:job-server' of type org.gradle.api.Project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-9204.
---
Fix Version/s: 2.20.0
   Resolution: Fixed

> HBase SDF @SplitRestriction does not take the range input into account to 
> restrict splits
> -
>
> Key: BEAM-9204
> URL: https://issues.apache.org/jira/browse/BEAM-9204
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is an issue because that the split is called multiple times and in this 
> cas it will produce repeated work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378529
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:26
Start Date: 28/Jan/20 22:26
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10701: [BEAM-9188] 
CassandraIO split performance improvement - cache size of the table
URL: https://github.com/apache/beam/pull/10701#discussion_r372093592
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
 ##
 @@ -375,9 +375,15 @@ private CassandraIO() {}
   static class CassandraSource extends BoundedSource {
 final Read spec;
 final List splitQueries;
+final Long estimatedSize;
 
 Review comment:
   Could you please add some comments about this field? Like, why and when the 
size is cached.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378529)
Time Spent: 1h 20m  (was: 1h 10m)

> Improving speed of splitting for Custom Sources
> ---
>
> Key: BEAM-9188
> URL: https://issues.apache.org/jira/browse/BEAM-9188
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> At this moment Custom Source in being split and serialized in sequence. If 
> there are many splits, it takes time to process all splits. 
>  
> Example: it takes 2s to calculate size and serialize CassandraSource due to 
> connection setup and teardown. With 100+ splits, it's a lot of time spent in 
> 1 worker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378530
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:26
Start Date: 28/Jan/20 22:26
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10701: [BEAM-9188] 
CassandraIO split performance improvement - cache size of the table
URL: https://github.com/apache/beam/pull/10701#discussion_r372092162
 
 

 ##
 File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
 ##
 @@ -508,26 +518,30 @@ private static long getNumSplits(
 
 @Override
 public long getEstimatedSizeBytes(PipelineOptions pipelineOptions) {
-  try (Cluster cluster =
-  getCluster(
-  spec.hosts(),
-  spec.port(),
-  spec.username(),
-  spec.password(),
-  spec.localDc(),
-  spec.consistencyLevel())) {
-if (isMurmur3Partitioner(cluster)) {
-  try {
-List tokenRanges =
-getTokenRanges(cluster, spec.keyspace().get(), 
spec.table().get());
-return getEstimatedSizeBytesFromTokenRanges(tokenRanges);
-  } catch (Exception e) {
-LOG.warn("Can't estimate the size", e);
+  if (estimatedSize != null) {
+return estimatedSize;
+  } else {
+try (Cluster cluster =
+getCluster(
+spec.hosts(),
+spec.port(),
+spec.username(),
+spec.password(),
+spec.localDc(),
+spec.consistencyLevel())) {
+  if (isMurmur3Partitioner(cluster)) {
+try {
+  List tokenRanges =
+  getTokenRanges(cluster, spec.keyspace().get(), 
spec.table().get());
+  return getEstimatedSizeBytesFromTokenRanges(tokenRanges);
 
 Review comment:
   Can we also cache the size here in case `getEstimateSizeBytes` is called 
more than once?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378530)
Time Spent: 1.5h  (was: 1h 20m)

> Improving speed of splitting for Custom Sources
> ---
>
> Key: BEAM-9188
> URL: https://issues.apache.org/jira/browse/BEAM-9188
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> At this moment Custom Source in being split and serialized in sequence. If 
> there are many splits, it takes time to process all splits. 
>  
> Example: it takes 2s to calculate size and serialize CassandraSource due to 
> connection setup and teardown. With 100+ splits, it's a lot of time spent in 
> 1 worker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378537
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:33
Start Date: 28/Jan/20 22:33
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10384: [BEAM-8933] 
Utilities for converting Arrow schemas and reading Arrow batches as Rows
URL: https://github.com/apache/beam/pull/10384#issuecomment-579495519
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378537)
Time Spent: 8h 50m  (was: 8h 40m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378536
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:32
Start Date: 28/Jan/20 22:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10384: [BEAM-8933] 
Utilities for converting Arrow schemas and reading Arrow batches as Rows
URL: https://github.com/apache/beam/pull/10384#issuecomment-579495439
 
 
   @alexvanboxel do you have time to take a look at this?
   
   cc: @emkornfield in case you want to take a look
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378536)
Time Spent: 8h 40m  (was: 8.5h)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9037) Instant and duration as logical type

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9037?focusedWorklogId=378538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378538
 ]

ASF GitHub Bot logged work on BEAM-9037:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:37
Start Date: 28/Jan/20 22:37
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10486: 
[BEAM-9037] Instant and duration as logical type
URL: https://github.com/apache/beam/pull/10486#discussion_r366529282
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosDuration.java
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.logicaltypes;
+
+import java.time.Duration;
+import org.apache.beam.sdk.values.Row;
+
+/** A duration represented in nanoseconds. */
+public class NanosDuration extends NanosType {
+  public static final String IDENTIFIER = "beam:logical_type:nanos_duration";
 
 Review comment:
   I think we should version these, can you add a v1 to this and nanos_instant?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378538)
Time Spent: 1h 40m  (was: 1.5h)

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378540
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:51
Start Date: 28/Jan/20 22:51
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372103595
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378541
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:52
Start Date: 28/Jan/20 22:52
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372104055
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378551
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 23:01
Start Date: 28/Jan/20 23:01
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372107178
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378555
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 23:08
Start Date: 28/Jan/20 23:08
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372109648
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378556
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 23:09
Start Date: 28/Jan/20 23:09
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on issue #10384: [BEAM-8933] 
Utilities for converting Arrow schemas and reading Arrow batches as Rows
URL: https://github.com/apache/beam/pull/10384#issuecomment-579515450
 
 
   @TheNeuralBit @iemejia Regarding Arrow dependencies there was a discussion 
on potentially moving to a [different allocator 
recently](https://mail-archives.apache.org/mod_mbox/arrow-dev/202001.mbox/%3CCAK7Z5T_KEnfFmVvg_EUzTH5OFuAu3OFK5NMhVeQ0UTHE%2BQ5smw%40mail.gmail.com%3E)
 on the Arrow Dev mailing list.  If you have thoughts please chime in.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378556)
Time Spent: 10.5h  (was: 10h 20m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378494
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 28/Jan/20 21:49
Start Date: 28/Jan/20 21:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10706: [BEAM-7246] Fix 
Spanner auth endpoints
URL: https://github.com/apache/beam/pull/10706#issuecomment-579473549
 
 
   cc: @mszb
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378494)
Time Spent: 13h  (was: 12h 50m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378504
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:00
Start Date: 28/Jan/20 22:00
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on pull request #10685: 
[BEAM-9188] Dataflow's WorkerCustomSources improvement - parallelize creation 
of Derived Sources (splitting)
URL: https://github.com/apache/beam/pull/10685
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378504)
Time Spent: 1h  (was: 50m)

> Improving speed of splitting for Custom Sources
> ---
>
> Key: BEAM-9188
> URL: https://issues.apache.org/jira/browse/BEAM-9188
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> At this moment Custom Source in being split and serialized in sequence. If 
> there are many splits, it takes time to process all splits. 
>  
> Example: it takes 2s to calculate size and serialize CassandraSource due to 
> connection setup and teardown. With 100+ splits, it's a lot of time spent in 
> 1 worker. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378523
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:18
Start Date: 28/Jan/20 22:18
Worklog Time Spent: 10m 
  Work Description: mszb commented on issue #10706: [BEAM-7246] Fix Spanner 
auth endpoints
URL: https://github.com/apache/beam/pull/10706#issuecomment-579487332
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378523)
Time Spent: 13h 10m  (was: 13h)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378524
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:19
Start Date: 28/Jan/20 22:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10706: [BEAM-7246] Fix 
Spanner auth endpoints
URL: https://github.com/apache/beam/pull/10706#issuecomment-579487896
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378524)
Time Spent: 13h 20m  (was: 13h 10m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378525
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:19
Start Date: 28/Jan/20 22:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10706: [BEAM-7246] Fix 
Spanner auth endpoints
URL: https://github.com/apache/beam/pull/10706#issuecomment-579487946
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378525)
Time Spent: 13.5h  (was: 13h 20m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9204) HBase SDF @SplitRestriction does not take the range input into account to restrict splits

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9204?focusedWorklogId=378527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378527
 ]

ASF GitHub Bot logged work on BEAM-9204:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:23
Start Date: 28/Jan/20 22:23
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10700: [BEAM-9204] 
Fix HBase SplitRestriction to be based on provided Range
URL: https://github.com/apache/beam/pull/10700
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378527)
Time Spent: 20m  (was: 10m)

> HBase SDF @SplitRestriction does not take the range input into account to 
> restrict splits
> -
>
> Key: BEAM-9204
> URL: https://issues.apache.org/jira/browse/BEAM-9204
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is an issue because that the split is called multiple times and in this 
> cas it will produce repeated work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378543
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:54
Start Date: 28/Jan/20 22:54
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372104685
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378546
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 22:55
Start Date: 28/Jan/20 22:55
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372105055
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=378554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378554
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 28/Jan/20 23:05
Start Date: 28/Jan/20 23:05
Worklog Time Spent: 10m 
  Work Description: emkornfield commented on pull request #10384: 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#discussion_r372108633
 
 

 ##
 File path: 
sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.arrow;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import javax.annotation.Nullable;
+import org.apache.arrow.vector.FieldVector;
+import org.apache.arrow.vector.VectorSchemaRoot;
+import org.apache.arrow.vector.types.TimeUnit;
+import org.apache.arrow.vector.types.pojo.ArrowType;
+import org.apache.arrow.vector.util.Text;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.CachingFactory;
+import org.apache.beam.sdk.schemas.Factory;
+import org.apache.beam.sdk.schemas.FieldValueGetter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.logicaltypes.FixedBytes;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+
+/**
+ * Utilities to create {@link Iterable}s of Beam {@link Row} instances backed 
by Arrow record
+ * batches.
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+public class ArrowConversion {
+  /** Converts Arrow schema to Beam row schema. */
+  public static Schema toBeamSchema(org.apache.arrow.vector.types.pojo.Schema 
schema) {
+return toBeamSchema(schema.getFields());
+  }
+
+  public static Schema 
toBeamSchema(List fields) {
+Schema.Builder builder = Schema.builder();
+for (org.apache.arrow.vector.types.pojo.Field field : fields) {
+  Field beamField = toBeamField(field);
+  builder.addField(beamField);
+}
+return builder.build();
+  }
+
+  /** Get Beam Field from Arrow Field. */
+  private static Field toBeamField(org.apache.arrow.vector.types.pojo.Field 
field) {
+FieldType beamFieldType = toFieldType(field.getFieldType(), 
field.getChildren());
+return Field.of(field.getName(), beamFieldType);
+  }
+
+  /** Converts Arrow FieldType to Beam FieldType. */
+  private static FieldType toFieldType(
+  org.apache.arrow.vector.types.pojo.FieldType arrowFieldType,
+  List childrenFields) {
+FieldType fieldType =
+arrowFieldType
+.getType()
+.accept(
+new ArrowType.ArrowTypeVisitor() {
+  @Override
+  public FieldType visit(ArrowType.Null type) {
+throw new IllegalArgumentException(
+"Type \'" + type.toString() + "\' not supported.");
+  }
+
+  @Override
+  public FieldType visit(ArrowType.Struct type) {
+return FieldType.row(toBeamSchema(childrenFields));
+  }
+
+  @Override
+  public FieldType visit(ArrowType.List type) {
+checkArgument(
+childrenFields.size() == 1,
+"Encountered "
++ childrenFields.size()
++ " child fields for list type, expected 1");
+return 
FieldType.array(toBeamField(childrenFields.get(0)).getType());
+  }
+
+  @Override
+  public FieldType visit(ArrowType.FixedSizeList

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378280
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:48
Start Date: 28/Jan/20 14:48
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10644: [BEAM-7427] 
Refactore JmsCheckpointMark to be usage via Coder
URL: https://github.com/apache/beam/pull/10644#issuecomment-579280837
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378280)
Time Spent: 8h 50m  (was: 8h 40m)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7691) Improve checkpoints documentation

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7691:
---
Summary: Improve checkpoints documentation  (was: 
UnboundedSource.CheckpointMark should mention that implementations should be 
Serializable or have have an associated Coder)

> Improve checkpoints documentation
> -
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc, newbie, starter
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7691) Improve checkpoints documentation

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7691:
---
Component/s: website

> Improve checkpoints documentation
> -
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, website
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc, newbie, starter
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9175) Introduce an autoformatting tool to Python SDK

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9175?focusedWorklogId=378323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378323
 ]

ASF GitHub Bot logged work on BEAM-9175:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:30
Start Date: 28/Jan/20 15:30
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10684: [BEAM-9175] 
Introduce an autoformatting tool to Python SDK
URL: https://github.com/apache/beam/pull/10684#issuecomment-579301469
 
 
   @robertwb I managed to add all knobs you asked for. I think the code looks 
better now (much less weird lines, that's for sure).
   
   I also added a pre-commit job that runs yapf with --diff option.

 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378323)
Time Spent: 3h 10m  (was: 3h)

> Introduce an autoformatting tool to Python SDK
> --
>
> Key: BEAM-9175
> URL: https://issues.apache.org/jira/browse/BEAM-9175
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Michał Walenia
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> It seems there are three main options:
>  * black - very simple, but not configurable at all (except for line length), 
> would drastically change code style
>  * yapf - more options to tweak, can omit parts of code
>  * autopep8 - more similar to spotless - only touches code that breaks 
> formatting guidelines, can use pycodestyle and flake8 as configuration
>  The rigidity of Black makes it unusable for Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378332
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:47
Start Date: 28/Jan/20 15:47
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579313211
 
 
   Let me think about that this week. 
https://issues.apache.org/jira/browse/BEAM-9206
   
   For this PR, I would only check the modules that use jackson: `find . -name 
'build.gradle' | xargs grep library.java.jackson_`.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378332)
Time Spent: 2h 10m  (was: 2h)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=378338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378338
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:55
Start Date: 28/Jan/20 15:55
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10546: [BEAM-9008] Add 
CassandraIO readAll method
URL: https://github.com/apache/beam/pull/10546#issuecomment-579317145
 
 
   > @iemejia I think that could work, thanks for your patience as I try to 
understand what you're thinking. Some questions:
   
   No, thanks to you who has been the patient one during this discussion.
   
   > 1. If our `ReadFn extends DoFn, A>` and the only way we 
have connection information is from the `Read` passed in to the 
processElement, that means we need to re-establish a DB connection for each  
batch of queries we run?  As in, the connection would be established in the 
`processElement` method and could not be in `setup` method?
   
   Yes exactly this will make the method simpler and the cost of starting a 
connection gets amortized by the processElement producing multiple outputs from 
a single connection.
   
   > 2. How would that work for the end user of a 
`PTransform, PCollection>`?  Here is what I did in the 
test and would wand to document how end users could generate 'queries',
   >
https://github.com/apache/beam/pull/10546/files#diff-8ba4ea3b09d563a67e29ff8584269d35R499
   >Would we instead want to return a `PCollection>` by using 
something like `return CassandraIO.read().withRingRange(new RingRange(start, 
finish))`?  If we do that however, we'd need to do the `withHosts` and all the 
other connection information, no?  The other option is establishing one 
`ReadAll` PTransform that maps over the `Read` input and enriches the db 
connection information?
   
   You have a point here!. We need `class ReadAll extends 
PTransform, PCollection>` and there we read as intended 
with `ReadFn`. You would have to modify however the `expand` of `Read` to do 
`input.apply(Create.of(this)).apply(CassandraIO.readAll())` where `ReadAll` 
should expand into 
`input.apply(ParDo.of(splitFn)).apply(Reshuffle).apply(Read)` users should deal 
with building the PCollection of `Reads` before passing that collection to 
`ReadAll`.
   
   > 3. Originally I had wanted to have the ReadFn operate on a 
_collection_ of 'query' objects to ensure a way to enforce linearizability with 
our queries (mainly so we don't oversaturate a single node/shard).  Currently 
the groupBy function a user passes in operates on the `RingRange` object, would 
we keep it that way and just, under the hood, allow for a single `Read` to 
hold a collection of RingRanges?
   
   If I understand this correctly this is covered by following the Create -> 
Split -> Reshuffle -> Read pattern mentioned above (in the mentioned IOs). So 
Split is the one who will generate a collection of `Read`s for each given 
`RingRange` then we use Reshuffle to guarantee that reads are redistributed and 
finally each read request is read by one worker.
   
   Hope this helps, don't hesitate to ask me more questions if still. I will 
try to answer quickly this time.
   Hope this helps
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378338)
Time Spent: 4h  (was: 3h 50m)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange =

[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=378339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378339
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:58
Start Date: 28/Jan/20 15:58
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10546: [BEAM-9008] Add 
CassandraIO readAll method
URL: https://github.com/apache/beam/pull/10546#issuecomment-579318866
 
 
   Forgot to mention that in the above comment that in the Split function you 
have to split in every case save if the user provided a specific RingRange to 
read from.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378339)
Time Spent: 4h 10m  (was: 4h)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=378151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378151
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 28/Jan/20 10:59
Start Date: 28/Jan/20 10:59
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-579191281
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378151)
Time Spent: 4.5h  (was: 4h 20m)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9063) Migrate docker images to apache namespace.

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9063?focusedWorklogId=378168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378168
 ]

ASF GitHub Bot logged work on BEAM-9063:


Author: ASF GitHub Bot
Created on: 28/Jan/20 11:44
Start Date: 28/Jan/20 11:44
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10612: [DO NOT 
MERGE][BEAM-9063] migrate docker images to apache
URL: https://github.com/apache/beam/pull/10612#discussion_r371752615
 
 

 ##
 File path: release/src/main/scripts/publish_docker_images.sh
 ##
 @@ -24,6 +24,9 @@
 
 set -e
 
+DOCKER_IMAGE_DEFAULT_REPO_ROOT=apache
+DOCKER_IMAGE_DEFAULT_REPO_PREFIX=beam_
 
 Review comment:
   Would it be worth sourcing these from a script?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378168)
Time Spent: 5h 40m  (was: 5.5h)

> Migrate docker images to apache namespace.
> --
>
> Key: BEAM-9063
> URL: https://issues.apache.org/jira/browse/BEAM-9063
> Project: Beam
>  Issue Type: Task
>  Components: beam-community
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> https://hub.docker.com/u/apache



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9063) Migrate docker images to apache namespace.

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9063?focusedWorklogId=378167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378167
 ]

ASF GitHub Bot logged work on BEAM-9063:


Author: ASF GitHub Bot
Created on: 28/Jan/20 11:44
Start Date: 28/Jan/20 11:44
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10612: [DO NOT 
MERGE][BEAM-9063] migrate docker images to apache
URL: https://github.com/apache/beam/pull/10612#discussion_r371752564
 
 

 ##
 File path: release/src/main/scripts/build_release_candidate.sh
 ##
 @@ -45,6 +45,9 @@ PYTHON_ARTIFACTS_DIR=python
 BEAM_ROOT_DIR=beam
 WEBSITE_ROOT_DIR=beam-site
 
+DOCKER_IMAGE_DEFAULT_REPO_ROOT=apache
+DOCKER_IMAGE_DEFAULT_REPO_PREFIX=beam_
 
 Review comment:
   Would it be worth sourcing these from a script?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378167)
Time Spent: 5h 40m  (was: 5.5h)

> Migrate docker images to apache namespace.
> --
>
> Key: BEAM-9063
> URL: https://issues.apache.org/jira/browse/BEAM-9063
> Project: Beam
>  Issue Type: Task
>  Components: beam-community
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> https://hub.docker.com/u/apache



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=378184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378184
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 28/Jan/20 12:30
Start Date: 28/Jan/20 12:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-579223188
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378184)
Time Spent: 5h 10m  (was: 5h)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=378212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378212
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:06
Start Date: 28/Jan/20 13:06
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-579236550
 
 
   Run Load Tests Java Combine Portable Flink Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378212)
Time Spent: 5h 20m  (was: 5h 10m)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7427:
--

Assignee: Jean-Baptiste Onofré  (was: Mourad)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7427:
---
Fix Version/s: 2.20.0

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7427:
---
Affects Version/s: 2.19.0
   2.13.0
   2.14.0
   2.15.0
   2.16.0
   2.17.0
   2.18.0

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378269
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:26
Start Date: 28/Jan/20 14:26
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579270595
 
 
   I want that job too! The challenge is that because of the many existing 
linkage errors, I'd have to compare
   
   - linkage errors in a PR, and
   - linkage errors in origin/master
   
   Like a code coverage report. As I don't know how to do that, I'm still doing 
it `diff` command in my local environment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378269)
Time Spent: 1h 50m  (was: 1h 40m)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378278
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:38
Start Date: 28/Jan/20 14:38
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579276082
 
 
   Well the manual comparison is not ideal but we can cope with that for the 
moment, what I don't want is to type the command for the 31 modules of this PR 
and then have to change it for other dependency upgrade. I just want some sort 
of `./gradlew :checkJavaLinkage` that works for the whole set of modules of the 
project. Is this 'feasible' with gradlew + Beam?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378278)
Time Spent: 2h  (was: 1h 50m)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378281
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:50
Start Date: 28/Jan/20 14:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10644: [BEAM-7427] 
Refactor JmsCheckpointMark to use SerializableCoder
URL: https://github.com/apache/beam/pull/10644
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378281)
Time Spent: 9h  (was: 8h 50m)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-7427.

Resolution: Fixed

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378282
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:50
Start Date: 28/Jan/20 14:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10644: [BEAM-7427] Refactor 
JmsCheckpointMark to use SerializableCoder
URL: https://github.com/apache/beam/pull/10644#issuecomment-579281715
 
 
   Merged manually, thanks again JB!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378282)
Time Spent: 9h 10m  (was: 9h)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7691) Improve checkpoints documentation

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7691:
---
Description: UnboundedSource.CheckpointMark should mention that 
implementations it should be encodable (have an associated Coder). Also maybe 
it is a good idea to explain a bit more the checkpointing semantics on Beam.

> Improve checkpoints documentation
> -
>
> Key: BEAM-7691
> URL: https://issues.apache.org/jira/browse/BEAM-7691
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, website
>Reporter: Ismaël Mejía
>Priority: Minor
>  Labels: documentation, javadoc, newbie, starter
>
> UnboundedSource.CheckpointMark should mention that implementations it should 
> be encodable (have an associated Coder). Also maybe it is a good idea to 
> explain a bit more the checkpointing semantics on Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9132) State request handler is removed prematurely when closing ActiveBundle

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9132?focusedWorklogId=378292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378292
 ]

ASF GitHub Bot logged work on BEAM-9132:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:19
Start Date: 28/Jan/20 15:19
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10694: [BEAM-9132] Use a unique 
bundle id across all SDK workers bound to the same environment
URL: https://github.com/apache/beam/pull/10694#issuecomment-579296135
 
 
   This does not fix the issue because the id generation scheme is effectively 
the same. I'm still seeing the same errors, but I have a new trace.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378292)
Time Spent: 1h 20m  (was: 1h 10m)

> State request handler is removed prematurely when closing ActiveBundle
> --
>
> Key: BEAM-9132
> URL: https://issues.apache.org/jira/browse/BEAM-9132
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We have observed these errors in a state-intense application: 
> {noformat}
> Error processing instruction 107. Original traceback is
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 659, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "apache_beam/runners/common.py", line 880, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>   File "apache_beam/runners/common.py", line 895, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>   File "redacted.py", line 56, in process
> recent_events_map = load_recent_events_map(recent_events_state)
>   File "redacted.py", line 128, in _load_recent_events_map
> items_in_recent_events_bag = list(recent_events_state.read())
>   File "apache_beam/runners/worker/bundle_processor.py", line 335, in __iter__
> for elem in self.first:
>   File "apache_beam/runners/worker/bundle_processor.py", line 214, in __iter__
> self._state_key, self._coder_impl, is_cached=self._is_cached)
>   File "apache_beam/runners/worker/sdk_worker.py", line 692, in blocking_get
> self._materialize_iter(state_key, coder))
>   File "apache_beam/runners/worker/sdk_worker.py", line 723, in 
> _materialize_iter
> self._underlying.get_raw(state_key, continuation_token)
>   File "apache_beam/runners/worker/sdk_worker.py", line 603, in get_raw
> continuation_token=continuation_token)))
>   File "apache_beam/runners/worker/sdk_worker.py", line 637, in 
> _blocking_request
> raise RuntimeError(response.error)
> RuntimeError: Unknown process bundle instruction id '107'
> {noformat}
> Notice that the error is thrown on the Runner side. It seems to originate 
> from the {{ActiveBundle}} de-registering the state request handler too early 
> when the processing may still be going on in the SDK Harness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7427) JmsCheckpointMark can not be correctly encoded

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7427:
---
Summary: JmsCheckpointMark can not be correctly encoded  (was: 
JmsCheckpointMark Avro Serialization issue with UnboundedSource)

> JmsCheckpointMark can not be correctly encoded
> --
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9175) Introduce an autoformatting tool to Python SDK

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9175?focusedWorklogId=378293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378293
 ]

ASF GitHub Bot logged work on BEAM-9175:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:20
Start Date: 28/Jan/20 15:20
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10684: [BEAM-9175] 
Introduce an autoformatting tool to Python SDK
URL: https://github.com/apache/beam/pull/10684#discussion_r371867218
 
 

 ##
 File path: sdks/python/apache_beam/typehints/trivial_inference.py
 ##
 @@ -68,24 +68,23 @@ def instance_to_type(o):
 return typehints.Tuple[[instance_to_type(item) for item in o]]
   elif t == list:
 if len(o) > 0:
-  return typehints.List[
-  typehints.Union[[instance_to_type(item) for item in o]]
-  ]
+  return typehints.List[typehints.Union[[
+  instance_to_type(item) for item in o
+  ]]]
 else:
   return typehints.List[typehints.Any]
   elif t == set:
 if len(o) > 0:
-  return typehints.Set[
-  typehints.Union[[instance_to_type(item) for item in o]]
-  ]
+  return typehints.Set[typehints.Union[[
+  instance_to_type(item) for item in o
+  ]]]
 else:
   return typehints.Set[typehints.Any]
   elif t == dict:
 if len(o) > 0:
   return typehints.Dict[
   typehints.Union[[instance_to_type(k) for k, v in o.items()]],
-  typehints.Union[[instance_to_type(v) for k, v in o.items()]],
-  ]
+  typehints.Union[[instance_to_type(v) for k, v in o.items()]], ]
 
 Review comment:
   It's OK now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378293)
Time Spent: 2h 50m  (was: 2h 40m)

> Introduce an autoformatting tool to Python SDK
> --
>
> Key: BEAM-9175
> URL: https://issues.apache.org/jira/browse/BEAM-9175
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Michał Walenia
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> It seems there are three main options:
>  * black - very simple, but not configurable at all (except for line length), 
> would drastically change code style
>  * yapf - more options to tweak, can omit parts of code
>  * autopep8 - more similar to spotless - only touches code that breaks 
> formatting guidelines, can use pycodestyle and flake8 as configuration
>  The rigidity of Black makes it unusable for Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9175) Introduce an autoformatting tool to Python SDK

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9175?focusedWorklogId=378294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378294
 ]

ASF GitHub Bot logged work on BEAM-9175:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:20
Start Date: 28/Jan/20 15:20
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10684: [BEAM-9175] 
Introduce an autoformatting tool to Python SDK
URL: https://github.com/apache/beam/pull/10684#discussion_r371867339
 
 

 ##
 File path: sdks/python/apache_beam/typehints/trivial_inference.py
 ##
 @@ -303,10 +302,8 @@ def infer_return_type(c, input_types, debug=False, 
depth=5):
 elif inspect.isclass(c):
   if c in typehints.DISALLOWED_PRIMITIVE_TYPES:
 return {
-list: typehints.List[Any],
-set: typehints.Set[Any],
-tuple: typehints.Tuple[Any, ...],
-dict: typehints.Dict[Any, Any]
+list: typehints.List[Any], set: typehints.Set[Any], tuple:
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378294)
Time Spent: 3h  (was: 2h 50m)

> Introduce an autoformatting tool to Python SDK
> --
>
> Key: BEAM-9175
> URL: https://issues.apache.org/jira/browse/BEAM-9175
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Michał Walenia
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> It seems there are three main options:
>  * black - very simple, but not configurable at all (except for line length), 
> would drastically change code style
>  * yapf - more options to tweak, can omit parts of code
>  * autopep8 - more similar to spotless - only touches code that breaks 
> formatting guidelines, can use pycodestyle and flake8 as configuration
>  The rigidity of Black makes it unusable for Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9175) Introduce an autoformatting tool to Python SDK

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9175?focusedWorklogId=378326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378326
 ]

ASF GitHub Bot logged work on BEAM-9175:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:33
Start Date: 28/Jan/20 15:33
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #10684: [BEAM-9175] 
Introduce an autoformatting tool to Python SDK
URL: https://github.com/apache/beam/pull/10684#issuecomment-579303250
 
 
   There is still a number of pylint issues `Wrong continued indentation`. Most 
of them appear because of how lambda is formatted. For example:
   
https://github.com/apache/beam/blob/7db61fbf2dd6eac4ffb542e684260edf0d892fea/sdks/python/apache_beam/io/gcp/bigquery_test.py#L908-L910
   
   I don't know yet how to deal with it. Unless there is a knob for this, I'll 
just put a # yapf: disable comments in these places.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378326)
Time Spent: 3h 20m  (was: 3h 10m)

> Introduce an autoformatting tool to Python SDK
> --
>
> Key: BEAM-9175
> URL: https://issues.apache.org/jira/browse/BEAM-9175
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Michał Walenia
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> It seems there are three main options:
>  * black - very simple, but not configurable at all (except for line length), 
> would drastically change code style
>  * yapf - more options to tweak, can omit parts of code
>  * autopep8 - more similar to spotless - only touches code that breaks 
> formatting guidelines, can use pycodestyle and flake8 as configuration
>  The rigidity of Black makes it unusable for Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9162) Upgrade Jackson to version 2.10.2

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9162?focusedWorklogId=378340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378340
 ]

ASF GitHub Bot logged work on BEAM-9162:


Author: ASF GitHub Bot
Created on: 28/Jan/20 16:00
Start Date: 28/Jan/20 16:00
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10643: [BEAM-9162] Upgrade 
Jackson to version 2.10.2
URL: https://github.com/apache/beam/pull/10643#issuecomment-579320038
 
 
   Hehe so the 31 that I mentioned above, mmm not an easy to sell proposition. 
On the other hand I can help with the jenkins part if you get to do an 
incantation that works locally for all modules.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378340)
Time Spent: 2h 20m  (was: 2h 10m)

> Upgrade Jackson to version 2.10.2
> -
>
> Key: BEAM-9162
> URL: https://issues.apache.org/jira/browse/BEAM-9162
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Jackson has a new way to deal with [deserialization security 
> issues|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10] in 
> 2.10.x so worth the upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9206) Easy way to run checkJavaLinkage?

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-9206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9206:
---
Status: Open  (was: Triage Needed)

> Easy way to run checkJavaLinkage?
> -
>
> Key: BEAM-9206
> URL: https://issues.apache.org/jira/browse/BEAM-9206
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>
> Follow up of iemejia's comment: 
> https://github.com/apache/beam/pull/10643#issuecomment-579276082
> bq.  I just want some sort of ./gradlew :checkJavaLinkage that works for the 
> whole set of modules of the project. Is this 'feasible' with gradlew + Beam?
> h1. Considerations
> * Something that can run on Jenkins
> * Comparison with the result of origin/master
> * Simple way to run checkJavaLinkage for all modules



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8298) Implement state caching for side inputs

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8298?focusedWorklogId=378345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378345
 ]

ASF GitHub Bot logged work on BEAM-8298:


Author: ASF GitHub Bot
Created on: 28/Jan/20 16:10
Start Date: 28/Jan/20 16:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10679: [BEAM-8298] Fully 
specify the necessary details to support side input caching.
URL: https://github.com/apache/beam/pull/10679#issuecomment-579326907
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378345)
Time Spent: 1h 10m  (was: 1h)

> Implement state caching for side inputs
> ---
>
> Key: BEAM-8298
> URL: https://issues.apache.org/jira/browse/BEAM-8298
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Caching is currently only implemented for user state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378258
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:15
Start Date: 28/Jan/20 14:15
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #10644: [BEAM-7427] 
Refactore JmsCheckpointMark to be usage via Coder
URL: https://github.com/apache/beam/pull/10644#issuecomment-579265247
 
 
   @iemejia thanks ! I will switch to other IOs improvements ;)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378258)
Time Spent: 8h 40m  (was: 8.5h)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378284
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:52
Start Date: 28/Jan/20 14:52
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8757: [BEAM-7427] Fix 
JmsCheckpointMark Avro Encoding
URL: https://github.com/apache/beam/pull/8757#issuecomment-579282755
 
 
   For info #10644 was merged today. The fix will be part of Beam 2.20.0 since 
the vote for 2.19.0 has already started.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378284)
Time Spent: 9h 20m  (was: 9h 10m)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 
> 2.19.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9188) Improving speed of splitting for Custom Sources

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9188?focusedWorklogId=378322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378322
 ]

ASF GitHub Bot logged work on BEAM-9188:


Author: ASF GitHub Bot
Created on: 28/Jan/20 15:24
Start Date: 28/Jan/20 15:24
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on pull request #10701: 
[BEAM-9188] CassandraIO split performance improvement - cache size of the table
URL: https://github.com/apache/beam/pull/10701
 
 
   
   Splitting CassandraIO source into multiple sources works fast as it uses one 
connection pool to Cassandra cluster but after that 
dataflow.worker.WorkerCustomSources is calling 
CassandraSource.getEstimatedSizeBytes for each source which setups and tears 
down connection to Cassandra cluster to calculate same size of table. This 
optimization introduces caching of size internally just to avoid additional 
queries. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build

[jira] [Resolved] (BEAM-4409) NoSuchMethodException reading from JmsIO

2020-01-28 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-4409.

Fix Version/s: 2.20.0
   Resolution: Fixed

> NoSuchMethodException reading from JmsIO
> 
>
> Key: BEAM-4409
> URL: https://issues.apache.org/jira/browse/BEAM-4409
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.4.0
> Environment: Linux, Java 1.8, Beam 2.4, Direct Runner, ActiveMQ
>Reporter: Edward Pricer
>Priority: Major
> Fix For: 2.20.0
>
>
> Running with the DirectRunner, and reading from a queue with JmsIO as an 
> unbounded source will produce a NoSuchMethodException. This occurs as the 
> UnboundedReadEvaluatorFactory.UnboundedReadEvaluator attempts to clone the 
> JmsCheckpointMark with the default (Avro) coder.
> The following trivial code on the reader side reproduces the error 
> (DirectRunner must be in path). The messages on the queue for this test case 
> were simple TextMessages. I found this exception is triggered more readily 
> when messages are published rapidly (~200/second)
> {code:java}
> Pipeline p = 
> Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());
> // read from the queue
> ConnectionFactory factory = new
> ActiveMQConnectionFactory("tcp://localhost:61616");
> PCollection inputStrings = p.apply("Read from queue",
> JmsIO.readMessage() .withConnectionFactory(factory)
> .withQueue("somequeue") .withCoder(StringUtf8Coder.of())
> .withMessageMapper((JmsIO.MessageMapper) message ->
> ((TextMessage) message).getText()));
> // decode 
> PCollection asStrings = inputStrings.apply("Decode Message", 
> ParDo.of(new DoFn() { @ProcessElement public
> void processElement(ProcessContext context) {
> System.out.println(context.element());
> context.output(context.element()); } })); 
> p.run();
> {code}
> Stack trace:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: javax.jms.Message.() at 
> org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:353) at 
> org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:369) at 
> org.apache.avro.reflect.ReflectData.newRecord(ReflectData.java:901) at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:212)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.reflect.ReflectDatumReader.readCollection(ReflectDatumReader.java:219)
>  at 
> org.apache.avro.reflect.ReflectDatumReader.readArray(ReflectDatumReader.java:137)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:177)
>  at 
> org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:302)
>  at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
>  at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
>  at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) 
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) 
> at org.apache.beam.sdk.coders.AvroCoder.decode(AvroCoder.java:318) at 
> org.apache.beam.sdk.coders.Coder.decode(Coder.java:170) at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:122) 
> at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:105) 
> at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:99) 
> at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:148) at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.getReader(UnboundedReadEvaluatorFactory.java:194)
>  at 
> org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:124)
>  at 
> org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:161)
>  at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:125)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748) Caused by: 
> java.lang.NoSuchMethodException: javax.jms.Message.() at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getDeclaredConstructor(Class.java:2178) at 
> org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:347)
> {code}
>  
> And a more contrived

[jira] [Created] (BEAM-9206) Easy way to run checkJavaLinkage?

2020-01-28 Thread Tomo Suzuki (Jira)

Tomo Suzuki created BEAM-9206:
-

 Summary: Easy way to run checkJavaLinkage?
 Key: BEAM-9206
 URL: https://issues.apache.org/jira/browse/BEAM-9206
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Tomo Suzuki
Assignee: Tomo Suzuki


Follow up of iemejia's comment: 
https://github.com/apache/beam/pull/10643#issuecomment-579276082

bq.  I just want some sort of ./gradlew :checkJavaLinkage that works for the 
whole set of modules of the project. Is this 'feasible' with gradlew + Beam?

h1. Considerations

* Something that can run on Jenkins
* Comparison with the result of origin/master
* Simple way to run checkJavaLinkage for all modules




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-9205) Regression in validates runner tests configuration in spark module

2020-01-28 Thread Etienne Chauchot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-9205.
--
Fix Version/s: 2.20.0
   Resolution: Fixed

> Regression in validates runner tests configuration in spark module
> --
>
> Key: BEAM-9205
> URL: https://issues.apache.org/jira/browse/BEAM-9205
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.20.0
>
>
> Not all the metrics tests are run: at least MetricsPusher is no more run



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9205) Regression in validates runner tests configuration in spark module

2020-01-28 Thread Etienne Chauchot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-9205:
---
Parent: BEAM-3310
Issue Type: Sub-task  (was: Test)

> Regression in validates runner tests configuration in spark module
> --
>
> Key: BEAM-9205
> URL: https://issues.apache.org/jira/browse/BEAM-9205
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> Not all the metrics tests are run: at least MetricsPusher is no more run



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9205) Regression in validates runner tests configuration in spark module

2020-01-28 Thread Etienne Chauchot (Jira)

Etienne Chauchot created BEAM-9205:
--

 Summary: Regression in validates runner tests configuration in 
spark module
 Key: BEAM-9205
 URL: https://issues.apache.org/jira/browse/BEAM-9205
 Project: Beam
  Issue Type: Test
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Not all the metrics tests are run: at least MetricsPusher is no more run



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8925) Beam Dependency Update Request: org.apache.tika:tika-core

2020-01-28 Thread Colm O hEigeartaigh (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colm O hEigeartaigh reassigned BEAM-8925:
-

Assignee: Colm O hEigeartaigh

> Beam Dependency Update Request: org.apache.tika:tika-core
> -
>
> Key: BEAM-8925
> URL: https://issues.apache.org/jira/browse/BEAM-8925
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Colm O hEigeartaigh
>Priority: Major
>
>  - 2019-12-09 12:20:22.212496 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-23 12:20:53.356760 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:15:58.081400 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:19:33.456649 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-13 12:18:38.940974 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-20 12:16:03.428169 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-27 12:17:01.302466 
> -
> Please consider upgrading the dependency org.apache.tika:tika-core. 
> The current version is 1.20. The latest version is 1.23 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378223
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:25
Start Date: 28/Jan/20 13:25
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on issue #10644: [BEAM-7427] 
Refactore JmsCheckpointMark to be usage via Coder
URL: https://github.com/apache/beam/pull/10644#issuecomment-579243908
 
 
   As discussed, I've keep `JmsCheckpointMark` in a dedicated class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378223)
Time Spent: 8h 10m  (was: 8h)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=378225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378225
 ]

ASF GitHub Bot logged work on BEAM-2535:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:26
Start Date: 28/Jan/20 13:26
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10627: [BEAM-2535] Support 
outputTimestamp and watermark holds in processing timers.
URL: https://github.com/apache/beam/pull/10627#issuecomment-579244488
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378225)
Time Spent: 13.5h  (was: 13h 20m)

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=378227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378227
 ]

ASF GitHub Bot logged work on BEAM-2535:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:26
Start Date: 28/Jan/20 13:26
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10627: [BEAM-2535] Support 
outputTimestamp and watermark holds in processing timers.
URL: https://github.com/apache/beam/pull/10627#issuecomment-579232967
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378227)
Time Spent: 13h 40m  (was: 13.5h)

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378246
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:50
Start Date: 28/Jan/20 13:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10644: [BEAM-7427] 
Refactore JmsCheckpointMark to be usage via Coder
URL: https://github.com/apache/beam/pull/10644#issuecomment-579254727
 
 
   Sorry for the delay, the extra commit with the fixes looks good. I was 
thinking that since the stored messages are not needed to restore the progress 
of the reads on `UnboundedJmsReader` maybe the simplest fix is just to let them 
transient as you proposed.
   About the State changes maybe let's do those in a subsequent PR so we can 
get this fix out more quickly. WDYT If you agree just let the class as it was 
before and then I will merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378246)
Time Spent: 8h 20m  (was: 8h 10m)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=378254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378254
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:06
Start Date: 28/Jan/20 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10655: [BEAM-8618] Tear 
down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#discussion_r371820931
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -315,18 +319,47 @@ def release(self, instruction_id):
 """
 descriptor_id, processor = 
self.active_bundle_processors.pop(instruction_id)
 processor.reset()
+self.last_access_time[descriptor_id] = time.time()
 self.cached_bundle_processors[descriptor_id].append(processor)
 
   def shutdown(self):
 """
 Shutdown all ``BundleProcessor``s in the cache.
 """
+if self.periodic_shutdown:
+  self.periodic_shutdown.cancel()
+  self.periodic_shutdown.join()
+  self.periodic_shutdown = None
+
 for instruction_id in self.active_bundle_processors:
   self.active_bundle_processors[instruction_id][1].shutdown()
   del self.active_bundle_processors[instruction_id]
 for cached_bundle_processors in self.cached_bundle_processors.values():
-  while len(cached_bundle_processors) > 0:
-cached_bundle_processors.pop().shutdown()
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  cached_bundle_processors)
+
+  def _schedule_periodic_shutdown(self):
+def shutdown_inactive_bundle_processors():
+  for descriptor_id, last_access_time in self.last_access_time.items():
+if time.time() - last_access_time > 60:
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  self.cached_bundle_processors[descriptor_id])
 
 Review comment:
   Don't we have to remove the bundle processor list from the dictionary? 
Otherwise we may access a cached shutdown bundle processor.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378254)
Time Spent: 50m  (was: 40m)

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=378256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378256
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:06
Start Date: 28/Jan/20 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10655: [BEAM-8618] Tear 
down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#discussion_r371818162
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -315,18 +319,47 @@ def release(self, instruction_id):
 """
 descriptor_id, processor = 
self.active_bundle_processors.pop(instruction_id)
 processor.reset()
+self.last_access_time[descriptor_id] = time.time()
 self.cached_bundle_processors[descriptor_id].append(processor)
 
   def shutdown(self):
 """
 Shutdown all ``BundleProcessor``s in the cache.
 """
+if self.periodic_shutdown:
+  self.periodic_shutdown.cancel()
+  self.periodic_shutdown.join()
+  self.periodic_shutdown = None
+
 for instruction_id in self.active_bundle_processors:
   self.active_bundle_processors[instruction_id][1].shutdown()
   del self.active_bundle_processors[instruction_id]
 for cached_bundle_processors in self.cached_bundle_processors.values():
-  while len(cached_bundle_processors) > 0:
-cached_bundle_processors.pop().shutdown()
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  cached_bundle_processors)
+
+  def _schedule_periodic_shutdown(self):
+def shutdown_inactive_bundle_processors():
+  for descriptor_id, last_access_time in self.last_access_time.items():
+if time.time() - last_access_time > 60:
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  self.cached_bundle_processors[descriptor_id])
+
+from apache_beam.runners.worker.data_plane import PeriodicThread
+self.periodic_shutdown = PeriodicThread(
+60, shutdown_inactive_bundle_processors)
 
 Review comment:
   Same here. Should be configurable or at least be extracted to a variable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378256)
Time Spent: 1h  (was: 50m)

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=378255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378255
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:06
Start Date: 28/Jan/20 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10655: [BEAM-8618] Tear 
down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#discussion_r371817978
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -315,18 +319,47 @@ def release(self, instruction_id):
 """
 descriptor_id, processor = 
self.active_bundle_processors.pop(instruction_id)
 processor.reset()
+self.last_access_time[descriptor_id] = time.time()
 self.cached_bundle_processors[descriptor_id].append(processor)
 
   def shutdown(self):
 """
 Shutdown all ``BundleProcessor``s in the cache.
 """
+if self.periodic_shutdown:
+  self.periodic_shutdown.cancel()
+  self.periodic_shutdown.join()
+  self.periodic_shutdown = None
+
 for instruction_id in self.active_bundle_processors:
   self.active_bundle_processors[instruction_id][1].shutdown()
   del self.active_bundle_processors[instruction_id]
 for cached_bundle_processors in self.cached_bundle_processors.values():
-  while len(cached_bundle_processors) > 0:
-cached_bundle_processors.pop().shutdown()
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  cached_bundle_processors)
+
+  def _schedule_periodic_shutdown(self):
+def shutdown_inactive_bundle_processors():
+  for descriptor_id, last_access_time in self.last_access_time.items():
+if time.time() - last_access_time > 60:
 
 Review comment:
   We may want to make this configurable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378255)
Time Spent: 1h  (was: 50m)

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=378257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378257
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:06
Start Date: 28/Jan/20 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10655: [BEAM-8618] Tear 
down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#discussion_r371819018
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -315,18 +319,47 @@ def release(self, instruction_id):
 """
 descriptor_id, processor = 
self.active_bundle_processors.pop(instruction_id)
 processor.reset()
+self.last_access_time[descriptor_id] = time.time()
 self.cached_bundle_processors[descriptor_id].append(processor)
 
   def shutdown(self):
 """
 Shutdown all ``BundleProcessor``s in the cache.
 """
+if self.periodic_shutdown:
+  self.periodic_shutdown.cancel()
+  self.periodic_shutdown.join()
+  self.periodic_shutdown = None
+
 for instruction_id in self.active_bundle_processors:
   self.active_bundle_processors[instruction_id][1].shutdown()
   del self.active_bundle_processors[instruction_id]
 for cached_bundle_processors in self.cached_bundle_processors.values():
-  while len(cached_bundle_processors) > 0:
-cached_bundle_processors.pop().shutdown()
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  cached_bundle_processors)
+
+  def _schedule_periodic_shutdown(self):
+def shutdown_inactive_bundle_processors():
+  for descriptor_id, last_access_time in self.last_access_time.items():
+if time.time() - last_access_time > 60:
+  BundleProcessorCache._shutdown_cached_bundle_processors(
+  self.cached_bundle_processors[descriptor_id])
+
+from apache_beam.runners.worker.data_plane import PeriodicThread
 
 Review comment:
   I think we should move this to the import section.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378257)
Time Spent: 1h 10m  (was: 1h)

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8618) Tear down unused DoFns periodically in Python SDK harness

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=378253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378253
 ]

ASF GitHub Bot logged work on BEAM-8618:


Author: ASF GitHub Bot
Created on: 28/Jan/20 14:06
Start Date: 28/Jan/20 14:06
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10655: [BEAM-8618] Tear 
down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#discussion_r371817547
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -280,6 +283,7 @@ def get(self, instruction_id, bundle_descriptor_id):
 try:
   # pop() is threadsafe
   processor = self.cached_bundle_processors[bundle_descriptor_id].pop()
+  self.last_access_time[bundle_descriptor_id] = time.time()
 except IndexError:
 
 Review comment:
   This won't update the access time when we first create the processor in the 
except block.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378253)
Time Spent: 40m  (was: 0.5h)

> Tear down unused DoFns periodically in Python SDK harness
> -
>
> Key: BEAM-8618
> URL: https://issues.apache.org/jira/browse/BEAM-8618
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=378154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378154
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 28/Jan/20 11:02
Start Date: 28/Jan/20 11:02
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-57919
 
 
   Run Load Tests Java Combine Portable Flink Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378154)
Time Spent: 5h  (was: 4h 50m)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=378153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378153
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 28/Jan/20 11:02
Start Date: 28/Jan/20 11:02
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-579192080
 
 
   Run Load Tests Java Combine Portable Flink Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378153)
Time Spent: 4h 50m  (was: 4h 40m)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8972) Add a Jenkins job running Combine load test on Java with Flink in Portability mode

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8972?focusedWorklogId=378152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378152
 ]

ASF GitHub Bot logged work on BEAM-8972:


Author: ASF GitHub Bot
Created on: 28/Jan/20 11:01
Start Date: 28/Jan/20 11:01
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10386: [BEAM-8972] Add 
Jenkins job with Combine test for portable Java
URL: https://github.com/apache/beam/pull/10386#issuecomment-579192080
 
 
   Run Load Tests Java Combine Portable Flink Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378152)
Time Spent: 4h 40m  (was: 4.5h)

> Add a Jenkins job running Combine load test on Java with Flink in Portability 
> mode
> --
>
> Key: BEAM-8972
> URL: https://issues.apache.org/jira/browse/BEAM-8972
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8550) @RequiresTimeSortedInput DoFn annotation

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8550?focusedWorklogId=378199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378199
 ]

ASF GitHub Bot logged work on BEAM-8550:


Author: ASF GitHub Bot
Created on: 28/Jan/20 12:51
Start Date: 28/Jan/20 12:51
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #8774: [BEAM-8550] Requires 
time sorted input
URL: https://github.com/apache/beam/pull/8774#issuecomment-579230688
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378199)
Time Spent: 9h 20m  (was: 9h 10m)

> @RequiresTimeSortedInput DoFn annotation
> 
>
> Key: BEAM-8550
> URL: https://issues.apache.org/jira/browse/BEAM-8550
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Implement new annotation {{@RequiresTimeSortedInput}} for stateful DoFn as 
> described in [design 
> document|https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing].
>  First implementation will assume that:
>   - time is defined by timestamp in associated WindowedValue
>   - allowed lateness is explicitly zero and all late elements are dropped 
> (due to being out of order)
> The above properties are considered temporary and will be resolved by 
> subsequent extensions (backwards compatible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=378228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378228
 ]

ASF GitHub Bot logged work on BEAM-2535:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:28
Start Date: 28/Jan/20 13:28
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10627: [BEAM-2535] Support 
outputTimestamp and watermark holds in processing timers.
URL: https://github.com/apache/beam/pull/10627#issuecomment-579245093
 
 
   Run Direct ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378228)
Time Spent: 13h 50m  (was: 13h 40m)

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7427) JmsCheckpointMark Avro Serialization issue with UnboundedSource

2020-01-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=378249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378249
 ]

ASF GitHub Bot logged work on BEAM-7427:


Author: ASF GitHub Bot
Created on: 28/Jan/20 13:52
Start Date: 28/Jan/20 13:52
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10644: [BEAM-7427] 
Refactore JmsCheckpointMark to be usage via Coder
URL: https://github.com/apache/beam/pull/10644#issuecomment-579255395
 
 
   Oh you already get rid of state hehe, my bad ok looking again.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 378249)
Time Spent: 8.5h  (was: 8h 20m)

> JmsCheckpointMark Avro Serialization issue with UnboundedSource
> ---
>
> Key: BEAM-7427
> URL: https://issues.apache.org/jira/browse/BEAM-7427
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-jms
>Affects Versions: 2.12.0
> Environment: Message Broker : solace
> JMS Client (Over AMQP) : "org.apache.qpid:qpid-jms-client:0.42.0
>Reporter: Mourad
>Assignee: Mourad
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> I get the following exception when reading from unbounded JMS Source:
>   
> {code:java}
> Caused by: org.apache.avro.SchemaParseException: Illegal character in: this$0
> at org.apache.avro.Schema.validateName(Schema.java:1151)
> at org.apache.avro.Schema.access$200(Schema.java:81)
> at org.apache.avro.Schema$Field.(Schema.java:403)
> at org.apache.avro.Schema$Field.(Schema.java:396)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:622)
> at org.apache.avro.reflect.ReflectData.createFieldSchema(ReflectData.java:740)
> at org.apache.avro.reflect.ReflectData.createSchema(ReflectData.java:604)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:218)
> at org.apache.avro.specific.SpecificData$2.load(SpecificData.java:215)
> at 
> avro.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
> at 
> avro.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
> {code}
>  
> The exception is thrown by Avro when introspecting {{JmsCheckpointMark}} to 
> generate schema.
> JmsIO config :
>  
> {code:java}
> PCollection messages = pipeline.apply("read messages from the 
> events broker", JmsIO.readMessage() 
> .withConnectionFactory(jmsConnectionFactory) .withTopic(options.getTopic()) 
> .withMessageMapper(new DFAMessageMapper()) 
> .withCoder(AvroCoder.of(DFAMessage.class)));
> {code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 211 matches

Mail list logo