[jira] [Closed] (BEAM-9461) CLONE - To use ByteArrayOutput/InputStream without synchronization

2020-03-05 Thread Kyoungha Min (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyoungha Min closed BEAM-9461.
--
Fix Version/s: Not applicable
   Resolution: Abandoned

> CLONE - To use ByteArrayOutput/InputStream without synchronization
> --
>
> Key: BEAM-9461
> URL: https://issues.apache.org/jira/browse/BEAM-9461
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-core
>Reporter: Kyoungha Min
>Priority: Minor
> Fix For: Not applicable
>
>
> It would be nice to see Beam using custom ByteArrayInput/OutputStream without 
> synchronization. It currently uses `ThreadLocal`, so using thread-safe stream 
> seems unnecessary. And all streams should never be access by more than 1 
> thread from the start anyway.
> Simply getting rid of the synchronized keyword will speed up about ~500 times 
> for single byte access. Something like org.apache.beam.sdk.util.VarInt will 
> get significant benefit from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9461) CLONE - To use ByteArrayOutput/InputStream without synchronization

2020-03-05 Thread Kyoungha Min (Jira)
Kyoungha Min created BEAM-9461:
--

 Summary: CLONE - To use ByteArrayOutput/InputStream without 
synchronization
 Key: BEAM-9461
 URL: https://issues.apache.org/jira/browse/BEAM-9461
 Project: Beam
  Issue Type: Wish
  Components: sdk-java-core
Reporter: Kyoungha Min


It would be nice to see Beam using custom ByteArrayInput/OutputStream without 
synchronization. It currently uses `ThreadLocal`, so using thread-safe stream 
seems unnecessary. And all streams should never be access by more than 1 thread 
from the start anyway.

Simply getting rid of the synchronized keyword will speed up about ~500 times 
for single byte access. Something like org.apache.beam.sdk.util.VarInt will get 
significant benefit from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=398962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398962
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 06/Mar/20 07:04
Start Date: 06/Mar/20 07:04
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10413: [BEAM-9035] 
Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#issuecomment-595630682
 
 
   @reuvenlax can you have a look at the changes?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398962)
Time Spent: 5h 50m  (was: 5h 40m)

> BIP-1: Typed options for Row Schema and Fields
> --
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the 
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to 
> fields and schema. In contracts to metadata, options would be added to 
> fields, logical types and rows. In the options schema convertors can add 
> options/annotations/decorators that were in the original schema, this context 
> can be used in the rest of the pipeline for specific transformations or 
> augment the end schema in the target output.
> Examples of options are:
>  * informational: like the source of the data, ...
>  * drive decisions further in the pipeline: flatten a row into another, 
> rename a field, ...
>  * influence something in the output: like cluster index, primary key, ...
>  * logical type information
> And option is a key/typed value combination. The advantages of having the 
> value types is: 
>  * Having strongly typed options would give a *portable way of Logical Types* 
> to have structured information that could be shared over different languages.
>  * This could keep the type intact when mapping from a formats that have 
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are 
> related:
>  # Typed options for Row Schema and Fields
>  # Convert Proto Options to Beam Schema options
>  # Convert Avro extra information for Beam string options
>  # Replace meta data with Logical Type options
>  # Extract meta data in Calcite SQL to Beam options
>  # Extract meta data in Zeta SQL to Beam options
>  # Add java example of using option in a transform 
> This feature is discussed with Reuven Lax, Brian Hulette



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-05 Thread Kyoungha Min (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyoungha Min updated BEAM-9325:
---
Description: 
org.apache.beam.sdk.util.UnownedOutputStream does not override a method

`public void write(byte b[], int off, int len) throws IOException`

resulting in extremely slow writing speed.

This is because `java.io.FilteredOutputStream` does not provide proper method.

 

 

  was:
org.apache.beam.sdk.util.UnownedOutputStream does not override a method

`public void write(byte b[], int off, int len) throws IOException`

resulting in extremely slow writing speed.

This is because `java.io.FilteredOutputStream` does not provide proper method.

 

The throughput degradation is significant enough to put it as bug. 

 

Anything that uses `UnownedInputStream`, including 
`CoderUtils.decodeFromByteArray`, `CoderUtils.decodeFromSafeStream` etc, are 
extremely slow. 

 Issue Type: Improvement  (was: Bug)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Priority: Major
> Fix For: Not applicable
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9325) UnownedOutputStream not overriding Array write method.

2020-03-05 Thread Kyoungha Min (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyoungha Min updated BEAM-9325:
---
Description: 
org.apache.beam.sdk.util.UnownedOutputStream does not override a method

`public void write(byte b[], int off, int len) throws IOException`

resulting in extremely slow writing speed.

This is because `java.io.FilteredOutputStream` does not provide proper method.

 

The throughput degradation is significant enough to put it as bug. 

 

Anything that uses `UnownedInputStream`, including 
`CoderUtils.decodeFromByteArray`, `CoderUtils.decodeFromSafeStream` etc, are 
extremely slow. 

  was:
org.apache.beam.sdk.util.UnownedOutputStream does not override a method

`public void write(byte b[], int off, int len) throws IOException`

resulting in extremely slow writing speed.

This is because `java.io.FilteredOutputStream` does not provide proper method.

 Issue Type: Bug  (was: Improvement)

> UnownedOutputStream not overriding Array write method.
> --
>
> Key: BEAM-9325
> URL: https://issues.apache.org/jira/browse/BEAM-9325
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Kyoungha Min
>Priority: Major
> Fix For: Not applicable
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> org.apache.beam.sdk.util.UnownedOutputStream does not override a method
> `public void write(byte b[], int off, int len) throws IOException`
> resulting in extremely slow writing speed.
> This is because `java.io.FilteredOutputStream` does not provide proper method.
>  
> The throughput degradation is significant enough to put it as bug. 
>  
> Anything that uses `UnownedInputStream`, including 
> `CoderUtils.decodeFromByteArray`, `CoderUtils.decodeFromSafeStream` etc, are 
> extremely slow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398951
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Mar/20 05:55
Start Date: 06/Mar/20 05:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595612913
 
 
   thanks Ning!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398951)
Time Spent: 99h 20m  (was: 99h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 99h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398950
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Mar/20 05:55
Start Date: 06/Mar/20 05:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11032: [BEAM-8335] 
Display rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398950)
Time Spent: 99h 10m  (was: 99h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 99h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398934
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Mar/20 05:14
Start Date: 06/Mar/20 05:14
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595604106
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398934)
Time Spent: 99h  (was: 98h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 99h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398933
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Mar/20 05:13
Start Date: 06/Mar/20 05:13
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595603969
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398933)
Time Spent: 98h 50m  (was: 98h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 98h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398932
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Mar/20 05:13
Start Date: 06/Mar/20 05:13
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595603902
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398932)
Time Spent: 98h 40m  (was: 98.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 98h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=398908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398908
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Mar/20 04:29
Start Date: 06/Mar/20 04:29
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#issuecomment-595594678
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398908)
Time Spent: 71h 50m  (was: 71h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 71h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9460) Unable to Start DataFlow Runner in latest version 2.19

2020-03-05 Thread karthik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

karthik updated BEAM-9460:
--
Component/s: (was: beam-community)
 runner-dataflow
 dependencies

> Unable to Start DataFlow Runner in latest version 2.19
> --
>
> Key: BEAM-9460
> URL: https://issues.apache.org/jira/browse/BEAM-9460
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies, runner-dataflow
>Affects Versions: 2.19.0
>Reporter: karthik
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Major
>
> *Unable to Start DataFlow Runner. It was working in old version 2.18. 
> Exception trace in the latest version*
> INFO: No stagingLocation provided, falling back to gcpTempLocation
> [WARNING]
> java.lang.RuntimeException: Failed to construct instance from factory method 
> DataflowRunner#fromOptions(interface 
> org.apache.beam.sdk.options.PipelineOptions)
>  at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod 
> (InstanceBuilder.java:224)
>  at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155)
>  at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55)
>  at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147)
>  at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run 
> (ActiveUsersCube.java:84)
>  at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main 
> (ActiveUsersCube.java:109)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke (Method.java:498)
>  at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
>  at java.lang.Thread.run (Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke (Method.java:498)
>  at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod 
> (InstanceBuilder.java:214)
>  at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155)
>  at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55)
>  at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147)
>  at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run 
> (ActiveUsersCube.java:84)
>  at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main 
> (ActiveUsersCube.java:109)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke (Method.java:498)
>  at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
>  at java.lang.Thread.run (Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: No files to stage has been 
> found.
>  at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions 
> (DataflowRunner.java:281)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke (Method.java:498)
>  at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod 
> (InstanceBuilder.java:214)
>  at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155)
>  at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55)
>  at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147)
>  at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run 
> (ActiveUsersCube.java:84)
>  at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main 
> (ActiveUsersCube.java:109)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke (Method.java:498)
>  at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
>  at java.lang.Thread.run (Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398872
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 06/Mar/20 02:37
Start Date: 06/Mar/20 02:37
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #11020: [BEAM-7926] Update 
Data Visualization
URL: https://github.com/apache/beam/pull/11020#issuecomment-595568670
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398872)
Time Spent: 57h 40m  (was: 57.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 57h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9460) Unable to Start DataFlow Runner in latest version 2.19

2020-03-05 Thread karthik (Jira)
karthik created BEAM-9460:
-

 Summary: Unable to Start DataFlow Runner in latest version 2.19
 Key: BEAM-9460
 URL: https://issues.apache.org/jira/browse/BEAM-9460
 Project: Beam
  Issue Type: Bug
  Components: beam-community
Affects Versions: 2.19.0
Reporter: karthik
Assignee: Aizhamal Nurmamat kyzy


*Unable to Start DataFlow Runner. It was working in old version 2.18. Exception 
trace in the latest version*

INFO: No stagingLocation provided, falling back to gcpTempLocation
[WARNING]
java.lang.RuntimeException: Failed to construct instance from factory method 
DataflowRunner#fromOptions(interface 
org.apache.beam.sdk.options.PipelineOptions)
 at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod 
(InstanceBuilder.java:224)
 at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155)
 at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55)
 at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147)
 at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run 
(ActiveUsersCube.java:84)
 at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main 
(ActiveUsersCube.java:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke (Method.java:498)
 at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
 at java.lang.Thread.run (Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke (Method.java:498)
 at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod 
(InstanceBuilder.java:214)
 at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155)
 at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55)
 at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147)
 at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run 
(ActiveUsersCube.java:84)
 at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main 
(ActiveUsersCube.java:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke (Method.java:498)
 at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
 at java.lang.Thread.run (Thread.java:745)
Caused by: java.lang.IllegalArgumentException: No files to stage has been found.
 at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions 
(DataflowRunner.java:281)
 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke (Method.java:498)
 at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod 
(InstanceBuilder.java:214)
 at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155)
 at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55)
 at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147)
 at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run 
(ActiveUsersCube.java:84)
 at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main 
(ActiveUsersCube.java:109)
 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke (Method.java:498)
 at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
 at java.lang.Thread.run (Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread Chun Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Yang resolved BEAM-8841.
-
Resolution: Fixed

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread Chun Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8841 started by Chun Yang.
---
> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread Chun Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Yang updated BEAM-8841:

Fix Version/s: 2.21.0

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9459) Go Postcommit failing at GBK

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9459?focusedWorklogId=398830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398830
 ]

ASF GitHub Bot logged work on BEAM-9459:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:15
Start Date: 06/Mar/20 01:15
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11061: [BEAM-9459] 
Revert "[BEAM-6374] Emit PCollection metrics from GoSDK"
URL: https://github.com/apache/beam/pull/11061
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398830)
Remaining Estimate: 0h
Time Spent: 10m

> Go Postcommit failing at GBK
> 
>
> Key: BEAM-9459
> URL: https://issues.apache.org/jira/browse/BEAM-9459
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go, test-failures
>Reporter: Daniel Oliveira
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/]
> [https://scans.gradle.com/s/es67rfaomu26m]
>  
> {noformat}
> 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782
> 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
> 2020/03/06 00:47:41 Console: 
> https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing
> 2020/03/06 00:47:41 Logs: 
> https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782
> ...
> 2020/03/06 00:50:41 Test cogbk:cogbk failed: job 
> 2020-03-05_16_47_40-13139296997856231782 failed{noformat}
> And then in the console logs: 
> [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z]
>  
> {code:java}
> exception: "java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> -165: process bundle failed for instruction -165 using plan -122 : panic: 
> Unexpected coder: 
> CoGBK goroutine 81 
> [running]:
> runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40)
>   /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
>  +0x60
> panic(0xd2c5e0, 0xc000bd7f40)
>   /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0,
>  0xc000aa4930, 0xc000b64a00)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
>  +0x479
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0,
>  0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
>  +0xfe
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
>  0xc000b57f80, 0xc000346c28, 0x0, 0x0)
>   
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
>  +0x6c
> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0,
>  0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, 
> 0xff0380, 0xc000b57fc0, 0xc000346de0, ...)
>   
> 

[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=398828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398828
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:08
Start Date: 06/Mar/20 01:08
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10717: [BEAM-8280] Enable type 
hint annotations
URL: https://github.com/apache/beam/pull/10717#issuecomment-595524860
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398828)
Time Spent: 7h 20m  (was: 7h 10m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9459) Go Postcommit failing at GBK

2020-03-05 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira updated BEAM-9459:
--
Description: 
Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/]

[https://scans.gradle.com/s/es67rfaomu26m]

 
{noformat}
2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782
2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
2020/03/06 00:47:41 Console: 
https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing
2020/03/06 00:47:41 Logs: 
https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782
...
2020/03/06 00:50:41 Test cogbk:cogbk failed: job 
2020-03-05_16_47_40-13139296997856231782 failed{noformat}
And then in the console logs: 
[https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z]

 
{code:java}
exception: "java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error received from SDK harness for instruction 
-165: process bundle failed for instruction -165 using plan -122 : panic: 
Unexpected coder: 
CoGBK goroutine 81 
[running]:
runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40)
/usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
 +0x60
panic(0xd2c5e0, 0xc000bd7f40)
/usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0,
 0xc000aa4930, 0xc000b64a00)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
 +0x479
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0,
 0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
 +0xfe
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
 0xc000b57f80, 0xc000346c28, 0x0, 0x0)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
 +0x6c
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0,
 0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, 0xff0380, 
0xc000b57fc0, 0xc000346de0, ...)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93
 +0xdf
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680,
 0x10017a0, 0xc0001bafc0, 0xc000b57dc0, 0xc0001bafc0)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211
 +0xa34
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0,
 0xc0001bafc0, 0xc000b57dc0)

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118
 +0x1cf
created by 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main

/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:131
 +0x6e8

at 

[jira] [Updated] (BEAM-9459) Go Postcommit failing at GBK

2020-03-05 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira updated BEAM-9459:
--
Description: 
Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/]

[https://scans.gradle.com/s/es67rfaomu26m]

 
{noformat}
2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782
2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
2020/03/06 00:47:41 Console: 
https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing
2020/03/06 00:47:41 Logs: 
https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782
...
2020/03/06 00:50:41 Test cogbk:cogbk failed: job 
2020-03-05_16_47_40-13139296997856231782 failed{noformat}
And then in the console logs: 
[https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z]

 
{code:java}
Error message from worker: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error received from SDK harness for instruction 
-489: process bundle failed for instruction -489 using plan -446 : panic: 
Unexpected coder: 
CoGBK goroutine 87 
[running]: runtime/debug.Stack(0xc0010ff970, 0xd2c5e0, 0xc00022e3d0) 
/usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc0010ffb90)
 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
 +0x60 panic(0xd2c5e0, 0xc00022e3d0) /usr/lib/go-1.12/src/runtime/panic.go:522 
+0x1b5 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc0013dc460,
 0xc0002466c0, 0xc000166000) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
 +0x479 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc001313dd0,
 0x10018e0, 0xc000268080, 0x0, 0xc0013f3b50) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
 +0xfe 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
 0xc000268080, 0xc0013f3c28, 0x0, 0x0) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
 +0x6c 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0013e8000,
 0x10018e0, 0xc000268080, 0xc000d14008, 0x4, 0xff0340, 0xc0002461e0, 0xff0380, 
0xc0002680c0, 0xc0013f3de0, ...) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93
 +0xdf 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680,
 0x10017a0, 0xc0001bafc0, 0xc00136d9c0, 0xc0001bafc0) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211
 +0xa34 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0,
 0xc0001bafc0, 0xc00136d9c0) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118
 +0x1cf created by 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main
 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:131
 +0x6e8 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 

[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398822
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:03
Start Date: 06/Mar/20 01:03
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#issuecomment-595523525
 
 
   Done: https://jira.apache.org/jira/browse/BEAM-9459
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398822)
Time Spent: 3h 50m  (was: 3h 40m)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9459) Go Postcommit failing at GBK

2020-03-05 Thread Daniel Oliveira (Jira)
Daniel Oliveira created BEAM-9459:
-

 Summary: Go Postcommit failing at GBK
 Key: BEAM-9459
 URL: https://issues.apache.org/jira/browse/BEAM-9459
 Project: Beam
  Issue Type: Bug
  Components: sdk-go, test-failures
Reporter: Daniel Oliveira
Assignee: Robert Burke


Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/]

[https://scans.gradle.com/s/es67rfaomu26m]

 
{noformat}
2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 
2020/03/06 00:47:41 Console: 
https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing
 2020/03/06 00:47:41 Logs: 
https://console.cloud.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782
...
2020/03/06 00:50:41 Test cogbk:cogbk failed: job 
2020-03-05_16_47_40-13139296997856231782 failed{noformat}
And then in the console logs: 
[https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782=500=false=2020-03-06T01:01:14.21000Z==true=2020-03-06T00:01:14.460Z=2020-03-06T01:01:14.460Z=PT1H=2020-03-06T00:49:14.413355915Z]

 
{noformat}
Error message from worker: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error received from SDK harness for instruction 
-489: process bundle failed for instruction -489 using plan -446 : panic: 
Unexpected coder: 
CoGBK goroutine 87 
[running]: runtime/debug.Stack(0xc0010ff970, 0xd2c5e0, 0xc00022e3d0) 
/usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc0010ffb90)
 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
 +0x60 panic(0xd2c5e0, 0xc00022e3d0) /usr/lib/go-1.12/src/runtime/panic.go:522 
+0x1b5 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc0013dc460,
 0xc0002466c0, 0xc000166000) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
 +0x479 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc001313dd0,
 0x10018e0, 0xc000268080, 0x0, 0xc0013f3b50) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
 +0xfe 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
 0xc000268080, 0xc0013f3c28, 0x0, 0x0) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
 +0x6c 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0013e8000,
 0x10018e0, 0xc000268080, 0xc000d14008, 0x4, 0xff0340, 0xc0002461e0, 0xff0380, 
0xc0002680c0, 0xc0013f3de0, ...) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93
 +0xdf 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680,
 0x10017a0, 0xc0001bafc0, 0xc00136d9c0, 0xc0001bafc0) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211
 +0xa34 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0,
 0xc0001bafc0, 0xc00136d9c0) 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118
 +0x1cf created by 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main
 

[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398819
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:00
Start Date: 06/Mar/20 01:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595522865
 
 
   exciting. thanks @chunyang 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398819)
Time Spent: 8h  (was: 7h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398820
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:01
Start Date: 06/Mar/20 01:01
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595523014
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398820)
Time Spent: 98.5h  (was: 98h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 98.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398818
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:00
Start Date: 06/Mar/20 01:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398818)
Time Spent: 7h 50m  (was: 7h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398806
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:50
Start Date: 06/Mar/20 00:50
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#issuecomment-595520053
 
 
   Could you file a JIRA with the trace and assign it to me please? I'm in the 
middle of packing.
   https://github.com/apache/beam/pull/11061 is the revert.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398806)
Time Spent: 3h 40m  (was: 3.5h)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=398804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398804
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:49
Start Date: 06/Mar/20 00:49
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #11061: Revert "[BEAM-6374] 
Emit PCollection metrics from GoSDK"
URL: https://github.com/apache/beam/pull/11061#issuecomment-595519768
 
 
   Run Go Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398804)
Time Spent: 1h 20m  (was: 1h 10m)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=398801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398801
 ]

ASF GitHub Bot logged work on BEAM-6374:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:48
Start Date: 06/Mar/20 00:48
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11061: Revert 
"[BEAM-6374] Emit PCollection metrics from GoSDK"
URL: https://github.com/apache/beam/pull/11061
 
 
   Reverts apache/beam#10942 
   
   Seems to be breaking the post commit. Since I'm going on vacation tonight, 
I'm rolling to back, and will look into it when I get back.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398801)
Time Spent: 1h 10m  (was: 1h)

> "elements added" for input and output collections is always empty
> -
>
> Key: BEAM-6374
> URL: https://issues.apache.org/jira/browse/BEAM-6374
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-go
>Reporter: Andrew Brampton
>Assignee: Robert Burke
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The field for "Elements added" and "Estimated size" is always blank when 
> running a Go binary on Dataflow. For example when running the work count 
> example: https://pasteboard.co/HVf80BU.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398800
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:47
Start Date: 06/Mar/20 00:47
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#issuecomment-595519138
 
 
   No, but it looks like it's somehow related to mine. I'm going to roll it 
back.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398800)
Time Spent: 3.5h  (was: 3h 20m)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=398787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398787
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:30
Start Date: 06/Mar/20 00:30
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10717: [BEAM-8280] Enable type 
hint annotations
URL: https://github.com/apache/beam/pull/10717#issuecomment-595514536
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398787)
Time Spent: 7h 10m  (was: 7h)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=398786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398786
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:30
Start Date: 06/Mar/20 00:30
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10717: [BEAM-8280] Enable type 
hint annotations
URL: https://github.com/apache/beam/pull/10717#issuecomment-595514519
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398786)
Time Spent: 7h  (was: 6h 50m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398784
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:25
Start Date: 06/Mar/20 00:25
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#issuecomment-595512896
 
 
   Run Go PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398784)
Time Spent: 3h 20m  (was: 3h 10m)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398783
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:24
Start Date: 06/Mar/20 00:24
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#issuecomment-595512813
 
 
   The Postcommit error doesn't seem to be directly related to my change from 
what I can tell:
   
   > Error message from worker: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error received from SDK harness for instruction 
-488: process bundle failed for instruction -488 using plan -445 : panic: 
Unexpected coder: 
CoGBK goroutine 87 
[running]:
   > runtime/debug.Stack(0xc00109d970, 0xd2c5e0, 0xc00113cb00)
   >/usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc00109db90)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40
 +0x60
   > panic(0xd2c5e0, 0xc00113cb00)
   >/usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc9bdb0,
 0xc00114b620, 0xc000822000)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91
 +0x479
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000c20fc0,
 0x10018e0, 0xc000c40f00, 0x0, 0xc0010b7b50)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59
 +0xfe
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0,
 0xc000c40f00, 0xc0010b7c28, 0x0, 0x0)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43
 +0x6c
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc001222ee0,
 0x10018e0, 0xc000c40f00, 0xc000d1a490, 0x4, 0xff0340, 0xc00114b440, 0xff0380, 
0xc000c40f40, 0xc0010b7de0, ...)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93
 +0xdf
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4480,
 0x10017a0, 0xc0001bafc0, 0xc000c40d40, 0xc0001bafc0)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211
 +0xa34
   > 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0,
 0xc0001bafc0, 0xc000c40d40)
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118
 +0x1cf
   > created by 
github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main
   >
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:131
 +0x6e8
   > 
   >
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
   > ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 

[jira] [Work logged] (BEAM-9448) Misleading log line: says "downloading" when using cache

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9448?focusedWorklogId=398779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398779
 ]

ASF GitHub Bot logged work on BEAM-9448:


Author: ASF GitHub Bot
Created on: 06/Mar/20 00:14
Start Date: 06/Mar/20 00:14
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #11051: [BEAM-9448] Fix log 
message for job server cache.
URL: https://github.com/apache/beam/pull/11051#issuecomment-595509792
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398779)
Time Spent: 40m  (was: 0.5h)

> Misleading log line: says "downloading" when using cache
> 
>
> Key: BEAM-9448
> URL: https://issues.apache.org/jira/browse/BEAM-9448
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Trivial
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/8d253ac99d78ef5345245ed71c7cf34328c55d9f/sdks/python/apache_beam/utils/subprocess_server.py#L197



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9458) Make Dataflow executed UnboundedSources using SDF as the default

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9458:
---
Status: Open  (was: Triage Needed)

> Make Dataflow executed UnboundedSources using SDF as the default
> 
>
> Key: BEAM-9458
> URL: https://issues.apache.org/jira/browse/BEAM-9458
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398776
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:56
Start Date: 05/Mar/20 23:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595504593
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398776)
Time Spent: 98h 20m  (was: 98h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 98h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398775
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:56
Start Date: 05/Mar/20 23:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595504547
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398775)
Time Spent: 98h 10m  (was: 98h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 98h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398774
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:55
Start Date: 05/Mar/20 23:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595504514
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398774)
Time Spent: 98h  (was: 97h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 98h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398773
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:50
Start Date: 05/Mar/20 23:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595503135
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398773)
Time Spent: 97h 50m  (was: 97h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 97h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398771
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:50
Start Date: 05/Mar/20 23:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11032: [BEAM-8335] Display 
rather than logging when is_in_notebook.
URL: https://github.com/apache/beam/pull/11032#issuecomment-595503103
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398771)
Time Spent: 97h 40m  (was: 97.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 97h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398766
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:42
Start Date: 05/Mar/20 23:42
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #11020: [BEAM-7926] Update 
Data Visualization
URL: https://github.com/apache/beam/pull/11020#issuecomment-595500738
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398766)
Time Spent: 57.5h  (was: 57h 20m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 57.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9458) Make Dataflow executed UnboundedSources using SDF as the default

2020-03-05 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9458:
---

 Summary: Make Dataflow executed UnboundedSources using SDF as the 
default
 Key: BEAM-9458
 URL: https://issues.apache.org/jira/browse/BEAM-9458
 Project: Beam
  Issue Type: Sub-task
  Components: runner-dataflow
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398763
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:28
Start Date: 05/Mar/20 23:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10478: 
[BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO
URL: https://github.com/apache/beam/pull/10478#issuecomment-595496877
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398763)
Time Spent: 16h 50m  (was: 16h 40m)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398765
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:28
Start Date: 05/Mar/20 23:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10478: 
[BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO
URL: https://github.com/apache/beam/pull/10478#issuecomment-595496977
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398765)
Time Spent: 17h 10m  (was: 17h)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398764
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:28
Start Date: 05/Mar/20 23:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10478: 
[BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO
URL: https://github.com/apache/beam/pull/10478#issuecomment-595496951
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398764)
Time Spent: 17h  (was: 16h 50m)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=398761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398761
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:15
Start Date: 05/Mar/20 23:15
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #11038: [BEAM-7746] More 
typing fixes
URL: https://github.com/apache/beam/pull/11038#issuecomment-595493314
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398761)
Time Spent: 71h 40m  (was: 71.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 71h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398757
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:02
Start Date: 05/Mar/20 23:02
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595489779
 
 
   yup it seems like flaky/unrelated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398757)
Time Spent: 7h 40m  (was: 7.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398756
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:02
Start Date: 05/Mar/20 23:02
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595489733
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398756)
Time Spent: 7.5h  (was: 7h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398750
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:00
Start Date: 05/Mar/20 23:00
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595489197
 
 
   Flaky/unrelated tests? I can't seem to reproduce locally.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398750)
Time Spent: 7h 20m  (was: 7h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398747
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:55
Start Date: 05/Mar/20 22:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595487536
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398747)
Time Spent: 7h 10m  (was: 7h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9457) Allow WriteToBigQuery with external data resource

2020-03-05 Thread Wenbing Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Bai updated BEAM-9457:
--
Status: Open  (was: Triage Needed)

> Allow WriteToBigQuery with external data resource
> -
>
> Key: BEAM-9457
> URL: https://issues.apache.org/jira/browse/BEAM-9457
> Project: Beam
>  Issue Type: New Feature
>  Components: io-py-gcp
>Reporter: Wenbing Bai
>Priority: Major
>
> Create another WriteToBigQuery.Method to allow user writeToBigQuery with 
> external data source like GCS, instead of loading the data to BigQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9457) Allow WriteToBigQuery with external data resource

2020-03-05 Thread Wenbing Bai (Jira)
Wenbing Bai created BEAM-9457:
-

 Summary: Allow WriteToBigQuery with external data resource
 Key: BEAM-9457
 URL: https://issues.apache.org/jira/browse/BEAM-9457
 Project: Beam
  Issue Type: New Feature
  Components: io-py-gcp
Reporter: Wenbing Bai


Create another WriteToBigQuery.Method to allow user writeToBigQuery with 
external data source like GCS, instead of loading the data to BigQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398741
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:44
Start Date: 05/Mar/20 22:44
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#issuecomment-595484373
 
 
   Run Go PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398741)
Time Spent: 3h  (was: 2h 50m)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398740
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:43
Start Date: 05/Mar/20 22:43
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#discussion_r388609914
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/fn.go
 ##
 @@ -209,21 +209,74 @@ func (f *DoFn) RestrictionT() *reflect.Type {
 // a KV or not based on the other signatures (unless we're more loose about 
which
 // sideinputs are present). Bind should respect that.
 
+// The following constants prefixed with "Main" represent possible numbers of
+// DoFn main inputs for DoFn construction and validation. Any value not defined
+// here is an invalid number of main inputs.
+const (
+   MainUnknown = -1 // The number of main inputs is unknown for DoFn 
validation.
 
 Review comment:
   I'm leaving it exported only because AsDoFn is currently exported and takes 
one of these constants as an input. Making this unexported would make it 
impossible to call AsDoFn with the existing behavior (unknown num. of inputs).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398740)
Time Spent: 2h 50m  (was: 2h 40m)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398736
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:40
Start Date: 05/Mar/20 22:40
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10991: [BEAM-3301] 
Refactor DoFn validation & allow specifying main inputs.
URL: https://github.com/apache/beam/pull/10991#discussion_r388608923
 
 

 ##
 File path: sdks/go/pkg/beam/core/graph/fn.go
 ##
 @@ -209,21 +209,74 @@ func (f *DoFn) RestrictionT() *reflect.Type {
 // a KV or not based on the other signatures (unless we're more loose about 
which
 // sideinputs are present). Bind should respect that.
 
+// The following constants prefixed with "Main" represent possible numbers of
 
 Review comment:
   I definitely like those options better. Went with the unexported constant 
type, since it makes the code more self-documenting as opposed to raw numbers. 
Also removed the validation check on that parameter, like you suggested.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398736)
Time Spent: 2h 40m  (was: 2.5h)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398731
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:24
Start Date: 05/Mar/20 22:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388602678
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
 
 Review comment:
   I agree, though as discussed earlier we might have difficulties parsing 
non-string options. I'll try it and see how it goes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398731)
Time Spent: 1h  (was: 50m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398730
 ]

ASF GitHub Bot logged work on BEAM-9446:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:22
Start Date: 05/Mar/20 22:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add 
missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388601783
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
 ' directly, rather than starting up a job server.'
 ' Only applies when flink_master is set to a'
 ' cluster address.  Requires Python 3.6+.')
+parser.add_argument(
+'--parallelism',
+default=-1,
+type=int,
+help='The degree of parallelism to be used when distributing '
+ 'operations onto workers. If the parallelism is not set, the '
+ 'configured Flink default is used, or 1 if none can be found.'
+)
+parser.add_argument(
+'--execution_mode_for_batch',
+default='PIPELINED',
+help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   I think that's what experiment(s) are for: 
https://github.com/apache/beam/blob/35beffc5775636eb96e33eb57c6e5f213cfe033a/sdks/python/apache_beam/options/pipeline_options.py#L803-L811
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398730)
Time Spent: 50m  (was: 40m)

> FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
> --
>
> Key: BEAM-9446
> URL: https://issues.apache.org/jira/browse/BEAM-9446
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I need these options for TFX, but they're being discarded (I believe they are 
> normally supplied by the job server).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9448) Misleading log line: says "downloading" when using cache

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9448?focusedWorklogId=398728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398728
 ]

ASF GitHub Bot logged work on BEAM-9448:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:18
Start Date: 05/Mar/20 22:18
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11051: [BEAM-9448] Fix 
log message for job server cache.
URL: https://github.com/apache/beam/pull/11051#discussion_r388600537
 
 

 ##
 File path: sdks/python/apache_beam/utils/subprocess_server.py
 ##
 @@ -194,9 +194,11 @@ def local_jar(cls, url):
 if os.path.exists(url):
   return url
 else:
-  _LOGGER.warning('Downloading job server jar from %s' % url)
   cached_jar = os.path.join(cls.JAR_CACHE, os.path.basename(url))
-  if not os.path.exists(cached_jar):
+  if os.path.exists(cached_jar):
+_LOGGER.warning('Using cached job server jar from %s' % url)
+  else:
+_LOGGER.warning('Downloading job server jar from %s' % url)
 
 Review comment:
   Changed it to `info`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398728)
Time Spent: 0.5h  (was: 20m)

> Misleading log line: says "downloading" when using cache
> 
>
> Key: BEAM-9448
> URL: https://issues.apache.org/jira/browse/BEAM-9448
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Trivial
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/8d253ac99d78ef5345245ed71c7cf34328c55d9f/sdks/python/apache_beam/utils/subprocess_server.py#L197



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9456) Upgrade to gradle 6.2

2020-03-05 Thread Alex Van Boxel (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Van Boxel updated BEAM-9456:
-
Status: Open  (was: Triage Needed)

> Upgrade to gradle 6.2
> -
>
> Key: BEAM-9456
> URL: https://issues.apache.org/jira/browse/BEAM-9456
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9456) Upgrade to gradle 6.2

2020-03-05 Thread Alex Van Boxel (Jira)
Alex Van Boxel created BEAM-9456:


 Summary: Upgrade to gradle 6.2
 Key: BEAM-9456
 URL: https://issues.apache.org/jira/browse/BEAM-9456
 Project: Beam
  Issue Type: Task
  Components: build-system
Reporter: Alex Van Boxel
Assignee: Alex Van Boxel






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9434?focusedWorklogId=398717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398717
 ]

ASF GitHub Bot logged work on BEAM-9434:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:44
Start Date: 05/Mar/20 21:44
Worklog Time Spent: 10m 
  Work Description: ecapoccia commented on issue #11037: [BEAM-9434] 
performance improvements reading many Avro files in S3
URL: https://github.com/apache/beam/pull/11037#issuecomment-595462888
 
 
   R: @lukecwik do you mind having a look and giving me feedback on this PR? 
Thanks I look forward to hearing from you
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398717)
Time Spent: 40m  (was: 0.5h)

> Performance improvements processing a large number of Avro files in S3+Spark
> 
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There is a performance issue when processing a large number of small Avro 
> files in Spark on K8S (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files, the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn, and the cluster being busy doing coordination of tiny 
> tasks with high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398716
 ]

ASF GitHub Bot logged work on BEAM-9442:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:44
Start Date: 05/Mar/20 21:44
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #11046: [BEAM-9442] 
Properly handle nullable fields in Select
URL: https://github.com/apache/beam/pull/11046#issuecomment-595462750
 
 
   > @alexvanboxel are you talking about the RabbitMQ failure?
   
   yes (rookie, mistake of me)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398716)
Time Spent: 1.5h  (was: 1h 20m)

> Schema Select does not properly handle nested nullable fields
> -
>
> Key: BEAM-9442
> URL: https://issues.apache.org/jira/browse/BEAM-9442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> A select of a nested field should be nullable if any of its parents are 
> nullable. So for example, a select of "a.b" should return a field named b 
> that is nullable if _either_ of a or b is nullable. Today we only examine b 
> to see if the selected fields should be nullable.
> Also the Select transform itself does not properly check for null values, and 
> throws NullPointerExceptions when some row values are null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark

2020-03-05 Thread Emiliano Capoccia (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050688#comment-17050688
 ] 

Emiliano Capoccia edited comment on BEAM-9434 at 3/5/20, 9:41 PM:
--

In the case outlined of a large number of very small (kb) avro files, the idea 
is to expose a new hint in the AvroIO class that can handle the reading of the 
input files with a pre determined number of parallel tasks.

Both extremes of having a very high or a very low number of tasks should be 
avoided, as they are suboptimal in terms of performance: too many tasks yield 
to very high overhead whereas too few (or a single one) result in an 
unacceptable serialisation on few nodes, with the cluster being under utilised. 

In my tests I read 6578 Avro files from S3, each containing a single record.

The performance of the reading the files using the proposed pull request #11037 
improved from 16 minutes to 2.3 minutes with 10 partitions.

Even more importantly, the memory used by every node is 1/10th roughly of the 
case with a single node.

*Reference run*, 6578 files, 1 task/executor, shuffle read 164kb, 6578 records, 
shuffle write 58Mb, 16 minutes execution time.

*PR #11037*, 10 tasks/executors, 660 files per task average, totalling 6578; 
23kb average shuffle read per task, 6 Mb average shuffle write per task, 2.3 
minutes execution time per executor in parallel.


was (Author: ecapoccia):
In the case outlined of a large number of very small (kb) avro files, the idea 
is to expose a new hint in the AvroIO class that can handle the reading of the 
input files with a pre determined number of parallel tasks.

Both extremes of having a very high or a very low number of tasks should be 
avoided, as they are suboptimal in terms of performance: too many tasks yield 
to very high overhead whereas a too few tasks (or a single one) result in an 
unacceptable serialisation of reading on too little node, with the cluster 
being under utilised. 

In the tests that I carried out, I was reading 6578 Avro files from S3, each 
containing a single record.

The performance of the reading using the proposed pull request #11037 improved 
using 10 partitions, from 16 minutes to 2.3 minutes for performing the same 
exact work.

Even more importantly, the memory used by every node is 1/10th roughly of the 
case with a single node.

*Reference run*, 6578 files, 1 task/executor, shuffle read 164kb, 6578 records, 
shuffle write 58Mb, 16 minutes execution time.

*PR #11037*, 10 tasks/executors, 660 files per task average, totalling 6578; 
23kb average shuffle read per task, 6 Mb average shuffle write per task, 2.3 
minutes execution time per executor in parallel.

> Performance improvements processing a large number of Avro files in S3+Spark
> 
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a performance issue when processing a large number of small Avro 
> files in Spark on K8S (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files, the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn, and the cluster being busy doing coordination of tiny 
> tasks with high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark

2020-03-05 Thread Emiliano Capoccia (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emiliano Capoccia updated BEAM-9434:

Description: 
There is a performance issue when processing a large number of small Avro files 
in Spark on K8S (tens of thousands or more).

The recommended way of reading a pattern of Avro files in Beam is by means of:

 
{code:java}
PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
.from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
{code}
However, in the case of many small files, the above results in the entire 
reading taking place in a single task/node, which is considerably slow and has 
scalability issues.

The option of omitting the hint is not viable, as it results in too many tasks 
being spawn, and the cluster being busy doing coordination of tiny tasks with 
high overhead.

There are a few workarounds on the internet which mainly revolve around 
compacting the input files before processing, so that a reduced number of bulky 
files is processed in parallel.

 

  was:
There is a performance issue when processing in Spark on K8S a large number of 
small Avro files (tens of thousands or more).

The recommended way of reading a pattern of Avro files in Beam is by means of:

 
{code:java}
PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
.from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
{code}
However, in the case of many small files the above results in the entire 
reading taking place in a single task/node, which is considerably slow and has 
scalability issues.

The option of omitting the hint is not viable, as it results in too many tasks 
being spawn and the cluster busy doing coordination of tiny tasks with high 
overhead.

There are a few workarounds on the internet which mainly revolve around 
compacting the input files before processing, so that a reduced number of bulky 
files is processed in parallel.

 


> Performance improvements processing a large number of Avro files in S3+Spark
> 
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a performance issue when processing a large number of small Avro 
> files in Spark on K8S (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files, the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn, and the cluster being busy doing coordination of tiny 
> tasks with high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9450) Update www.apache.org/dist/ links to point to downloads.apache.org

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9450.

Fix Version/s: Not applicable
   Resolution: Fixed

> Update www.apache.org/dist/ links to point to downloads.apache.org
> --
>
> Key: BEAM-9450
> URL: https://issues.apache.org/jira/browse/BEAM-9450
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Infra is deprecating /dist for downloads, for ref 
> [https://blogs.apache.org/infra/entry/more-secure-and-robust-downloads]
> {quote}As of March 2020, we are deprecating www.apache.org/dist/ in favor of
> [https://downloads.apache.org/]
> for backup downloads as well as signature and checksum verification. The 
> primary driver has been splitting up web site visits and downloads to gain 
> better control and offer a better service for both downloads and web site 
> visits.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9450) Update www.apache.org/dist/ links to point to downloads.apache.org

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9450?focusedWorklogId=398715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398715
 ]

ASF GitHub Bot logged work on BEAM-9450:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:37
Start Date: 05/Mar/20 21:37
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11054: [BEAM-9450] 
Update www.apache.org/dist/ links to downloads.apache.org
URL: https://github.com/apache/beam/pull/11054
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398715)
Time Spent: 20m  (was: 10m)

> Update www.apache.org/dist/ links to point to downloads.apache.org
> --
>
> Key: BEAM-9450
> URL: https://issues.apache.org/jira/browse/BEAM-9450
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Infra is deprecating /dist for downloads, for ref 
> [https://blogs.apache.org/infra/entry/more-secure-and-robust-downloads]
> {quote}As of March 2020, we are deprecating www.apache.org/dist/ in favor of
> [https://downloads.apache.org/]
> for backup downloads as well as signature and checksum verification. The 
> primary driver has been splitting up web site visits and downloads to gain 
> better control and offer a better service for both downloads and web site 
> visits.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark

2020-03-05 Thread Emiliano Capoccia (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emiliano Capoccia updated BEAM-9434:

Summary: Performance improvements processing a large number of Avro files 
in S3+Spark  (was: Performance improvements processiong a large number of Avro 
files in S3+Spark)

> Performance improvements processing a large number of Avro files in S3+Spark
> 
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is a performance issue when processing in Spark on K8S a large number 
> of small Avro files (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn and the cluster busy doing coordination of tiny tasks with 
> high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398714
 ]

ASF GitHub Bot logged work on BEAM-9442:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:35
Start Date: 05/Mar/20 21:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11046: [BEAM-9442] 
Properly handle nullable fields in Select
URL: https://github.com/apache/beam/pull/11046#issuecomment-595459415
 
 
   @alexvanboxel are you talking about the RabbitMQ failure?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398714)
Time Spent: 1h 20m  (was: 1h 10m)

> Schema Select does not properly handle nested nullable fields
> -
>
> Key: BEAM-9442
> URL: https://issues.apache.org/jira/browse/BEAM-9442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> A select of a nested field should be nullable if any of its parents are 
> nullable. So for example, a select of "a.b" should return a field named b 
> that is nullable if _either_ of a or b is nullable. Today we only examine b 
> to see if the selected fields should be nullable.
> Also the Select transform itself does not properly check for null values, and 
> throws NullPointerExceptions when some row values are null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398712
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:35
Start Date: 05/Mar/20 21:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595459219
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398712)
Time Spent: 7h  (was: 6h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398710
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:33
Start Date: 05/Mar/20 21:33
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-595458238
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398710)
Time Spent: 97.5h  (was: 97h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 97.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398703
 ]

ASF GitHub Bot logged work on BEAM-9442:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:25
Start Date: 05/Mar/20 21:25
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #11046: [BEAM-9442] 
Properly handle nullable fields in Select
URL: https://github.com/apache/beam/pull/11046#issuecomment-595455457
 
 
   The error in the test is my fault, approved a fix without seeing the tests 
were ran. Just need to be rebased.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398703)
Time Spent: 1h 10m  (was: 1h)

> Schema Select does not properly handle nested nullable fields
> -
>
> Key: BEAM-9442
> URL: https://issues.apache.org/jira/browse/BEAM-9442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A select of a nested field should be nullable if any of its parents are 
> nullable. So for example, a select of "a.b" should return a field named b 
> that is nullable if _either_ of a or b is nullable. Today we only examine b 
> to see if the selected fields should be nullable.
> Also the Select transform itself does not properly check for null values, and 
> throws NullPointerExceptions when some row values are null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=398697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398697
 ]

ASF GitHub Bot logged work on BEAM-9250:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:22
Start Date: 05/Mar/20 21:22
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10791: [BEAM-9250] Update 
release guide with more instructions.
URL: https://github.com/apache/beam/pull/10791#issuecomment-595453999
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398697)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve beam release script based on 2.19.0 release experience
> --
>
> Key: BEAM-9250
> URL: https://issues.apache.org/jira/browse/BEAM-9250
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7556) Enable to upgrade proxy generation independently of beam for java support

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-7556.

Fix Version/s: Not applicable
   Resolution: Duplicate

> Enable to upgrade proxy generation independently of beam for java support
> -
>
> Key: BEAM-7556
> URL: https://issues.apache.org/jira/browse/BEAM-7556
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: 2.13.0
>Reporter: Romain Manni-Bucau
>Priority: Major
> Fix For: Not applicable
>
>
> Beam is now using a custom shaded version of bytebudy which makes impossible 
> - until you reshade - to upgrade bytebuddy without requiring a new beam 
> release.
> However with the fast release rate of the JVM it is important to be able to 
> upgrade bytebuddy - at least while beam is using it which is technically not 
> a strong requirement - to enable to run on the new JVM.
> For example, last beam release does not support recent java:
> {code}
> Caused by: java.lang.UnsupportedOperationException: Cannot define class using 
> reflection: Cannot define nest member class 
> java.lang.reflect.AccessibleObject$Cache + within different package then 
> class 
> org.apache.beam.repackaged.beam_sdks_java_core.net.bytebuddy.mirror.AccessibleObject
> {code}
> My preference to fix this issue would be to relax the proxying definition to 
> just use a "proxy classloader" where the proxy would be defined but it 
> requires to be able to attach it to an execution - where beam is not yet 
> super clean.
> Alternative is to have a SPI for the asm usage and enable to user to replace 
> the bytebuddy impl with either a not shaded version or even a pure asm one to 
> let him control the dependencies.
> Romain



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9453) Fix potential UnsupportedEncodingException

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9453?focusedWorklogId=398694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398694
 ]

ASF GitHub Bot logged work on BEAM-9453:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:17
Start Date: 05/Mar/20 21:17
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #11017: [BEAM-9453]  
Fix potential UnsupportedEncodingException
URL: https://github.com/apache/beam/pull/11017#issuecomment-595452062
 
 
   > @alexvanboxel it causes broken jenkins test in spotless check on master 
branch
   > 
   > https://builds.apache.org/job/beam_PreCommit_Spotless_Commit/7888/console
   
   sorry, should have seen that it didn't have tests attached.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398694)
Time Spent: 40m  (was: 0.5h)

> Fix potential UnsupportedEncodingException
> --
>
> Key: BEAM-9453
> URL: https://issues.apache.org/jira/browse/BEAM-9453
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
>Affects Versions: 2.16.0
>Reporter: Henry Tang
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: Not applicable
>
>   Original Estimate: 0h
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the code assigns a new string with
> {code:java}
> String s = new String(bytes, "UTF-8");
> {code}
> This has the possibility of throwing an UnsupportedEncodingException.
>  
> Using
> {code:java}
> new String(bytes, StandardCharsets.UTF_8){code}
> avoids the possibility of throwing an UnsupportedEncodingException
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8421) Job API relies on org.apache.beam.vendor.

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8421.

Fix Version/s: Not applicable
   Resolution: Won't Fix

> Job API relies on org.apache.beam.vendor.
> -
>
> Key: BEAM-8421
> URL: https://issues.apache.org/jira/browse/BEAM-8421
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.16.0
>Reporter: Romain Manni-Bucau
>Priority: Major
> Fix For: Not applicable
>
>
> API shouldn't rely on any internal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8421) Job API relies on org.apache.beam.vendor.

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8421:
---
Status: Open  (was: Triage Needed)

> Job API relies on org.apache.beam.vendor.
> -
>
> Key: BEAM-8421
> URL: https://issues.apache.org/jira/browse/BEAM-8421
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.16.0
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> API shouldn't rely on any internal



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7891) gRPC vendoring contains overlapping classes

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7891:
--

Assignee: Luke Cwik  (was: Ismaël Mejía)

> gRPC vendoring contains overlapping classes
> ---
>
> Key: BEAM-7891
> URL: https://issues.apache.org/jira/browse/BEAM-7891
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.15.0
>
>
> In 2.14 the overlapping bug between modules is still not fixed, it still 
> prevents to use beam with some JVM, pollutes a lot shadowing/uber jar 
> creation and can prevent beam to run under some classloading setup 
> (potentielly in an engine/runner). Here is one example:
>  
> {code:java}
> [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, 
> beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping 
> classes:
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code}
> This task is indeed about fixing the overlappings but also ensuring it can't 
> come in 2.15 since all versions are affected since vendoring had been set up 
> and it never had been cleanly fixed on all the build.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7891) gRPC vendoring contains overlapping classes

2020-03-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7891:
--

Assignee: Ismaël Mejía

> gRPC vendoring contains overlapping classes
> ---
>
> Key: BEAM-7891
> URL: https://issues.apache.org/jira/browse/BEAM-7891
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.15.0
>
>
> In 2.14 the overlapping bug between modules is still not fixed, it still 
> prevents to use beam with some JVM, pollutes a lot shadowing/uber jar 
> creation and can prevent beam to run under some classloading setup 
> (potentielly in an engine/runner). Here is one example:
>  
> {code:java}
> [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, 
> beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping 
> classes:
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code}
> This task is indeed about fixing the overlappings but also ensuring it can't 
> come in 2.15 since all versions are affected since vendoring had been set up 
> and it never had been cleanly fixed on all the build.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398687
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:04
Start Date: 05/Mar/20 21:04
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595446366
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398687)
Time Spent: 6h 50m  (was: 6h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9056) Staging artifacts from environment

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=398680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398680
 ]

ASF GitHub Bot logged work on BEAM-9056:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:53
Start Date: 05/Mar/20 20:53
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10621: 
[BEAM-9056] Staging artifacts from environment
URL: https://github.com/apache/beam/pull/10621#discussion_r388558761
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java
 ##
 @@ -261,14 +263,20 @@ public String registerCoder(Coder coder) throws 
IOException {
* return the same unique ID.
*/
   public String registerEnvironment(Environment env) {
+String environmentId;
 String existing = environmentIds.get(env);
 if (existing != null) {
-  return existing;
+  environmentId = existing;
+} else {
+  String name = uniqify(env.getUrn(), environmentIds.values());
+  environmentIds.put(env, name);
+  componentsBuilder.putEnvironments(name, env);
+  environmentId = name;
 }
-String name = uniqify(env.getUrn(), environmentIds.values());
-environmentIds.put(env, name);
-componentsBuilder.putEnvironments(name, env);
-return name;
+if (defaultEnvironmentId == null) {
 
 Review comment:
   Ok. Let's still do this immediately after this one though so that we do not 
forget about it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398680)
Time Spent: 5h 40m  (was: 5.5h)

> Staging artifacts from environment
> --
>
> Key: BEAM-9056
> URL: https://issues.apache.org/jira/browse/BEAM-9056
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> staging artifacts from artifact information embedded in environment proto.
> detail: 
> https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398679
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:51
Start Date: 05/Mar/20 20:51
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10477: 
[BEAM-8932][Cleanup] Cleanup pubsubio by removing optionality and adding 
defaults to builders.
URL: https://github.com/apache/beam/pull/10477
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398679)
Time Spent: 16h 40m  (was: 16.5h)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398678
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:49
Start Date: 05/Mar/20 20:49
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-595439800
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398678)
Time Spent: 97h 20m  (was: 97h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 97h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398674
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:42
Start Date: 05/Mar/20 20:42
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-595436740
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398674)
Time Spent: 97h 10m  (was: 97h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 97h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7891) gRPC vendoring contains overlapping classes

2020-03-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-7891:

Fix Version/s: 2.15.0

> gRPC vendoring contains overlapping classes
> ---
>
> Key: BEAM-7891
> URL: https://issues.apache.org/jira/browse/BEAM-7891
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Priority: Major
> Fix For: 2.15.0
>
>
> In 2.14 the overlapping bug between modules is still not fixed, it still 
> prevents to use beam with some JVM, pollutes a lot shadowing/uber jar 
> creation and can prevent beam to run under some classloading setup 
> (potentielly in an engine/runner). Here is one example:
>  
> {code:java}
> [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, 
> beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping 
> classes:
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code}
> This task is indeed about fixing the overlappings but also ensuring it can't 
> come in 2.15 since all versions are affected since vendoring had been set up 
> and it never had been cleanly fixed on all the build.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7891) gRPC vendoring contains overlapping classes

2020-03-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-7891.
-
Resolution: Fixed

> gRPC vendoring contains overlapping classes
> ---
>
> Key: BEAM-7891
> URL: https://issues.apache.org/jira/browse/BEAM-7891
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Priority: Major
> Fix For: 2.15.0
>
>
> In 2.14 the overlapping bug between modules is still not fixed, it still 
> prevents to use beam with some JVM, pollutes a lot shadowing/uber jar 
> creation and can prevent beam to run under some classloading setup 
> (potentielly in an engine/runner). Here is one example:
>  
> {code:java}
> [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, 
> beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping 
> classes:
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code}
> This task is indeed about fixing the overlappings but also ensuring it can't 
> come in 2.15 since all versions are affected since vendoring had been set up 
> and it never had been cleanly fixed on all the build.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7891) gRPC vendoring contains overlapping classes

2020-03-05 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052492#comment-17052492
 ] 

Luke Cwik commented on BEAM-7891:
-

This was fixed in 2.15. The jar dropped from ~3mb to ~30kb.

> gRPC vendoring contains overlapping classes
> ---
>
> Key: BEAM-7891
> URL: https://issues.apache.org/jira/browse/BEAM-7891
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> In 2.14 the overlapping bug between modules is still not fixed, it still 
> prevents to use beam with some JVM, pollutes a lot shadowing/uber jar 
> creation and can prevent beam to run under some classloading setup 
> (potentielly in an engine/runner). Here is one example:
>  
> {code:java}
> [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, 
> beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping 
> classes:
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph
> [INFO] [WARNING] - 
> org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code}
> This task is indeed about fixing the overlappings but also ensuring it can't 
> come in 2.15 since all versions are affected since vendoring had been set up 
> and it never had been cleanly fixed on all the build.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398662
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:23
Start Date: 05/Mar/20 20:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #11020: [BEAM-7926] Update 
Data Visualization
URL: https://github.com/apache/beam/pull/11020#issuecomment-595429224
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398662)
Time Spent: 57h 20m  (was: 57h 10m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 57h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=398663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398663
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:23
Start Date: 05/Mar/20 20:23
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#issuecomment-595429286
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398663)
Time Spent: 71.5h  (was: 71h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 71.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398659
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:18
Start Date: 05/Mar/20 20:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-595427003
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398659)
Time Spent: 96h 50m  (was: 96h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 96h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398660
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:18
Start Date: 05/Mar/20 20:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-595427106
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398660)
Time Spent: 97h  (was: 96h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 97h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398637
 ]

ASF GitHub Bot logged work on BEAM-9442:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:01
Start Date: 05/Mar/20 20:01
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11046: [BEAM-9442] 
Properly handle nullable fields in Select
URL: https://github.com/apache/beam/pull/11046#issuecomment-595420264
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398637)
Time Spent: 1h  (was: 50m)

> Schema Select does not properly handle nested nullable fields
> -
>
> Key: BEAM-9442
> URL: https://issues.apache.org/jira/browse/BEAM-9442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> A select of a nested field should be nullable if any of its parents are 
> nullable. So for example, a select of "a.b" should return a field named b 
> that is nullable if _either_ of a or b is nullable. Today we only examine b 
> to see if the selected fields should be nullable.
> Also the Select transform itself does not properly check for null values, and 
> throws NullPointerExceptions when some row values are null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398636
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 05/Mar/20 20:01
Start Date: 05/Mar/20 20:01
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10994: [BEAM-8335] 
TeststreamService integration with DirectRunner
URL: https://github.com/apache/beam/pull/10994#issuecomment-595420007
 
 
   there are some test failures
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398636)
Time Spent: 96h 40m  (was: 96.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 96h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9453) Fix potential UnsupportedEncodingException

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9453?focusedWorklogId=398629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398629
 ]

ASF GitHub Bot logged work on BEAM-9453:


Author: ASF GitHub Bot
Created on: 05/Mar/20 19:44
Start Date: 05/Mar/20 19:44
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #11017: [BEAM-9453]  Fix 
potential UnsupportedEncodingException
URL: https://github.com/apache/beam/pull/11017#issuecomment-595412338
 
 
   @alexvanboxel it causes broken jenkins test in spotless check on master 
branch
   
   https://builds.apache.org/job/beam_PreCommit_Spotless_Commit/7888/console
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398629)
Time Spent: 0.5h  (was: 20m)

> Fix potential UnsupportedEncodingException
> --
>
> Key: BEAM-9453
> URL: https://issues.apache.org/jira/browse/BEAM-9453
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
>Affects Versions: 2.16.0
>Reporter: Henry Tang
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: Not applicable
>
>   Original Estimate: 0h
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the code assigns a new string with
> {code:java}
> String s = new String(bytes, "UTF-8");
> {code}
> This has the possibility of throwing an UnsupportedEncodingException.
>  
> Using
> {code:java}
> new String(bytes, StandardCharsets.UTF_8){code}
> avoids the possibility of throwing an UnsupportedEncodingException
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9455) Environment-sensitive provisioning for Dataflow

2020-03-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9455:
--
Parent: BEAM-9238
Issue Type: Sub-task  (was: Improvement)

> Environment-sensitive provisioning for Dataflow
> ---
>
> Key: BEAM-9455
> URL: https://issues.apache.org/jira/browse/BEAM-9455
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Environment-sensitive provisioning for Dataflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9455) Environment-sensitive provisioning for Dataflow

2020-03-05 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-9455:
-

 Summary: Environment-sensitive provisioning for Dataflow
 Key: BEAM-9455
 URL: https://issues.apache.org/jira/browse/BEAM-9455
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Heejong Lee
Assignee: Heejong Lee


Environment-sensitive provisioning for Dataflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9455) Environment-sensitive provisioning for Dataflow

2020-03-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9455:
--
Status: Open  (was: Triage Needed)

> Environment-sensitive provisioning for Dataflow
> ---
>
> Key: BEAM-9455
> URL: https://issues.apache.org/jira/browse/BEAM-9455
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Environment-sensitive provisioning for Dataflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9229) Adding dependency information to Environment proto

2020-03-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-9229.
---
Fix Version/s: 2.20.0
   Resolution: Fixed

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9383) Staging Dataflow artifacts from environment

2020-03-05 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9383:
--
Parent: BEAM-9238
Issue Type: Sub-task  (was: Improvement)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9056) Staging artifacts from environment

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=398618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398618
 ]

ASF GitHub Bot logged work on BEAM-9056:


Author: ASF GitHub Bot
Created on: 05/Mar/20 19:21
Start Date: 05/Mar/20 19:21
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10621: [BEAM-9056] 
Staging artifacts from environment
URL: https://github.com/apache/beam/pull/10621#discussion_r388509693
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java
 ##
 @@ -261,14 +263,20 @@ public String registerCoder(Coder coder) throws 
IOException {
* return the same unique ID.
*/
   public String registerEnvironment(Environment env) {
+String environmentId;
 String existing = environmentIds.get(env);
 if (existing != null) {
-  return existing;
+  environmentId = existing;
+} else {
+  String name = uniqify(env.getUrn(), environmentIds.values());
+  environmentIds.put(env, name);
+  componentsBuilder.putEnvironments(name, env);
+  environmentId = name;
 }
-String name = uniqify(env.getUrn(), environmentIds.values());
-environmentIds.put(env, name);
-componentsBuilder.putEnvironments(name, env);
-return name;
+if (defaultEnvironmentId == null) {
 
 Review comment:
   If we change the signature of `registerEnvironment`, a number of test files 
(*TranslationTest, *RunnerTest) also need to be touched. I think it will create 
unnecessary noise in this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398618)
Time Spent: 5.5h  (was: 5h 20m)

> Staging artifacts from environment
> --
>
> Key: BEAM-9056
> URL: https://issues.apache.org/jira/browse/BEAM-9056
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> staging artifacts from artifact information embedded in environment proto.
> detail: 
> https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >