date:20200518

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434791
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 19/May/20 05:13
Start Date: 19/May/20 05:13
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-630584744


   Heejong, can you please resolve conflicts and push an update to rerun tests ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434791)
Time Spent: 10h 50m  (was: 10h 40m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?focusedWorklogId=434790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434790
 ]

ASF GitHub Bot logged work on BEAM-9984:


Author: ASF GitHub Bot
Created on: 19/May/20 04:47
Start Date: 19/May/20 04:47
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11737:
URL: https://github.com/apache/beam/pull/11737#issuecomment-630577035


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434790)
Time Spent: 2h 20m  (was: 2h 10m)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Omar Ismail
>Priority: P2
>  Labels: starter
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434789
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 04:44
Start Date: 19/May/20 04:44
Worklog Time Spent: 10m 
  Work Description: amaliujia merged pull request #10946:
URL: https://github.com/apache/beam/pull/10946


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434789)
Time Spent: 6h 20m  (was: 6h 10m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=434782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434782
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/May/20 03:52
Start Date: 19/May/20 03:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11740:
URL: https://github.com/apache/beam/pull/11740#issuecomment-630562942


   Tests passed. PTAL.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434782)
Time Spent: 19h 10m  (was: 19h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P1
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110811#comment-17110811
 ] 

Luke Cwik edited comment on BEAM-9339 at 5/19/20, 3:13 AM:
---

Java SDK for Dataflow was fixed in 2.22 while Python has worked since 2.21. Go 
also has worked since 2.21


was (Author: lcwik):
Java SDK for Dataflow was fixed in 2.22

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434776
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 03:12
Start Date: 19/May/20 03:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11748:
URL: https://github.com/apache/beam/pull/11748#discussion_r427005013



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
##
@@ -183,7 +183,10 @@ public JobSpecification translate(
 String workerHarnessContainerImageURL =
 
DataflowRunner.getContainerImageForJob(options.as(DataflowPipelineOptions.class));
 RunnerApi.Environment defaultEnvironmentForDataflow =
-Environments.createDockerEnvironment(workerHarnessContainerImageURL);
+Environments.createDockerEnvironment(workerHarnessContainerImageURL)
+.toBuilder()
+.addAllCapabilities(Environments.getJavaCapabilities())

Review comment:
   Requirements is on the pipeline and that works.
   
   Since Dataflow is creating its own environment it is difficult to have 
common code because of the differences in the artifact staging and job APIs





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434776)
Time Spent: 8h 10m  (was: 8h)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9339.
-
Resolution: Fixed

Java SDK for Dataflow was fixed in 2.22

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434777
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 03:12
Start Date: 19/May/20 03:12
Worklog Time Spent: 10m 
  Work Description: lukecwik merged pull request #11748:
URL: https://github.com/apache/beam/pull/11748


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434777)
Time Spent: 8h 20m  (was: 8h 10m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=434774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434774
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 19/May/20 03:10
Start Date: 19/May/20 03:10
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #11747:
URL: https://github.com/apache/beam/pull/11747#discussion_r427004608



##
File path: sdks/go/pkg/beam/io/synthetic/source.go
##
@@ -135,27 +155,79 @@ func (fn *sourceFn) ProcessElement(rt 
*offsetrange.Tracker, config SourceConfig,
return nil
 }
 
-// DefaultSourceConfig creates a SourceConfig with intended defaults for its
-// fields. SourceConfigs should be initialized with this method.
-func DefaultSourceConfig() SourceConfig {
-   return SourceConfig{
-   NumElements:   1, // Defaults shouldn't drop elements, so at 
least 1.
-   InitialSplits: 1, // Defaults to 1, i.e. no initial splitting.
+// SourceConfigBuilder is used to initialize SourceConfigs. See
+// SourceConfigBuilder's methods for descriptions of the fields in a
+// SourceConfig and how they can be set. The intended approach for using this
+// builder is to begin by calling the DefaultSourceConfig function, followed by
+// calling setters, followed by calling Build.
+//
+// Usage example:
+//
+//cfg := 
synthetic.DefaultSourceConfig().NumElements(5000).InitialSplits(2).Build()
+type SourceConfigBuilder struct {
+   cfg SourceConfig
+}
+
+// DefaultSourceConfig creates a SourceConfigBuilder set with intended defaults
+// for the SourceConfig fields. This function is the intended starting point 
for
+// initializing a SourceConfig and should always be used to create
+// SourceConfigBuilders.
+//
+// To see descriptions of the various SourceConfig fields and their defaults,
+// see the methods to SourceConfigBuilder.
+func DefaultSourceConfig() *SourceConfigBuilder {
+   return {
+   cfg: SourceConfig{
+   numElements:   1, // 0 is invalid (drops elements).
+   initialSplits: 1, // 0 is invalid (drops elements).
+   },
+   }
+}
+
+// NumElements is the number of elements for the source to generate and emit.
+//
+// Valid values are in the range of [1, ...] and the default value is 1. Values
+// of 0 (and below) are invalid as they result in sources that emit no 
elements.
+func (b *SourceConfigBuilder) NumElements(val int) *SourceConfigBuilder {
+   b.cfg.numElements = val
+   return b
+}
+
+// InitialSplits determines the number of initial splits to perform in the
+// source's SplitRestriction method. Restrictions in synthetic sources 
represent
+// the number of elements being emitted, and this split is performed evenly
+// across that number of elements.
+//
+// Each resulting restriction will have at least 1 element in it, and each
+// element being emitted will be contained in exactly one restriction. That
+// means that if the desired number of splits is greater than the number of
+// elements N, then N initial restrictions will be created, each containing 1
+// element.
+//
+// Valid values are in the range of [1, ...] and the default value is 1. Values
+// of 0 (and below) are invalid as they would result in dropping elements that
+// are expected to be emitted.
+func (b *SourceConfigBuilder) InitialSplits(val int) *SourceConfigBuilder {
+   b.cfg.initialSplits = val
+   return b
+}
+
+// Build constructs the SourceConfig initialized by this builder. It also
+// performs error checking on the fields, and panics if any have been set to
+// invalid values.
+func (b *SourceConfigBuilder) Build() SourceConfig {
+   if b.cfg.initialSplits <= 0 {
+   panic(fmt.Sprintf("SourceConfig.InitialSplits must be >= 1. 
Got: %v", b.cfg.initialSplits))
+   }
+   if b.cfg.numElements <= 0 {
+   panic(fmt.Sprintf("SourceConfig.NumElements must be >= 1. Got: 
%v", b.cfg.numElements))
}
+   return b.cfg
 }
 
 // SourceConfig is a struct containing all the configuration options for a
-// synthetic source.
+// synthetic source. It should be created via a SourceConfigBuilder.
 type SourceConfig struct {
-   // NumElements is the number of elements for the source to generate and
-   // emit.
-   NumElements int
-
-   // InitialSplits determines the number of initial splits to perform in 
the
-   // source's SplitRestriction method. Note that in some edge cases, the
-   // number of splits performed might differ from this config value. Each
-   // restriction will always have one element in it, and at least one
-   // restriction will always be output, so the number of splits will be in
-   // the range of [1, N] where N is the size of the original restriction.
-

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=434775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434775
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 19/May/20 03:10
Start Date: 19/May/20 03:10
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #11747:
URL: https://github.com/apache/beam/pull/11747#discussion_r427004634



##
File path: sdks/go/pkg/beam/io/synthetic/step.go
##
@@ -144,49 +143,130 @@ func (fn *sdfStepFn) Setup() {
 // ProcessElement takes an input and either filters it or produces a number of
 // outputs identical to that input based on the restriction size.
 func (fn *sdfStepFn) ProcessElement(rt *offsetrange.Tracker, key, val []byte, 
emit func([]byte, []byte)) {
-   if fn.cfg.FilterRatio > 0 && fn.rng.Float64() < fn.cfg.FilterRatio {
-   return
-   }
+   filtered := fn.cfg.filterRatio > 0 && fn.rng.Float64() < 
fn.cfg.filterRatio
+
for i := rt.Rest.Start; rt.TryClaim(i) == true; i++ {
-   emit(key, val)
+   if !filtered {
+   emit(key, val)
+   }
+   }
+}
+
+// StepConfigBuilder is used to initialize StepConfigs. See StepConfigBuilder's
+// methods for descriptions of the fields in a StepConfig and how they can be
+// set. The intended approach for using this builder is to begin by calling the
+// DefaultStepConfig function, followed by calling setters, followed by calling
+// Build.
+//
+// Usage example:
+//
+//cfg := 
synthetic.DefaultStepConfig().OutputPerInput(10).FilterRatio(0.5).Build()
+type StepConfigBuilder struct {
+   cfg StepConfig
+}
+
+// DefaultSourceConfig creates a StepConfig with intended defaults for the
+// StepConfig fields. This function is the intended starting point for
+// initializing a StepConfig and should always be used to create
+// StepConfigBuilders.
+//
+// To see descriptions of the various StepConfig fields and their defaults, see
+// the methods to StepConfigBuilder.
+func DefaultStepConfig() *StepConfigBuilder {
+   return {
+   cfg: StepConfig{
+   outputPerInput: 1, // Defaults shouldn't drop 
elements, so at least 1.
+   filterRatio:0.0,   // Defaults shouldn't drop 
elements, so don't filter.
+   splittable: false, // Default to non-splittable, 
SDFs are situational.
+   initialSplits:  1, // Defaults to 1, i.e. no 
initial splitting.
+   },
}
 }
 
-// DefaultSourceConfig creates a SourceConfig with intended defaults for its
-// fields. SourceConfigs should be initialized with this method.
-func DefaultStepConfig() StepConfig {
-   return StepConfig{
-   OutputPerInput: 1, // Defaults shouldn't drop elements, so 
at least 1.
-   FilterRatio:0.0,   // Defaults shouldn't drop elements, so 
don't filter.
-   Splittable: false, // Default to non-splittable, SDFs are 
situational.
-   InitialSplits:  1, // Defaults to 1, i.e. no initial 
splitting.
+// OutputPerInput is the number of outputs to emit per input received. Each
+// output is identical to the original input. A value of 0 drops all inputs and
+// produces no output.
+//
+// Valid values are in the range of [0, ...] and the default value is 1. Values
+// below 0 are invalid as they have no logical meaning for this field.
+func (b *StepConfigBuilder) OutputPerInput(val int) *StepConfigBuilder {
+   b.cfg.outputPerInput = val
+   return b
+}
+
+// FilterRatio indicates the random chance that an input will be filtered
+// out, meaning that no outputs will get emitted for it. For example, a
+// FilterRatio of 0.25 means that 25% of inputs will be filtered out, a
+// FilterRatio of 0 means no elements are filtered, and a FilterRatio of 1.0
+// means every element is filtered.
+//
+// In a non-splittable step, this is performed on each input element, meaning
+// all outputs for that element would be filtered. In a splittable step, this 
is
+// performed on each input restriction instead of the entire element, meaning
+// that some outputs for an element may be filtered and others kept.
+//
+// Note that even when elements are filtered out, the work associated with
+// processing those elements is still performed, which differs from setting an
+// OutputPerInput of 0. Also note that if a
+//
+// Valid values are in the range if [0.0, 1.0], and the default value is 0. In
+// order to avoid precision errors, invalid values do not cause errors. 
Instead,
+// values below 0 are functionally equivalent to 0, and values above 1 are
+// functionally equivalent to 1.
+func (b *StepConfigBuilder) FilterRatio(val float64) *StepConfigBuilder {
+   b.cfg.filterRatio =

[jira] [Work logged] (BEAM-9958) Linkage Checker to use exclusion file as baseline

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9958?focusedWorklogId=434772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434772
 ]

ASF GitHub Bot logged work on BEAM-9958:


Author: ASF GitHub Bot
Created on: 19/May/20 03:00
Start Date: 19/May/20 03:00
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #11674:
URL: https://github.com/apache/beam/pull/11674#issuecomment-630549911


   No test for this scripts. With this PR, we can setup a Jenkins task to run 
the Linkage Checker, say, "Run Java LinkageChecker".



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434772)
Time Spent: 1h 10m  (was: 1h)

> Linkage Checker to use exclusion file as baseline
> -
>
> Key: BEAM-9958
> URL: https://issues.apache.org/jira/browse/BEAM-9958
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Linkage Checker to use exclusion file as baseline.
> Linkage Checker 1.4.0 has function to take exclusion file to filter out 
> existing linkage errors. This functionality eliminates the need of running 
> diff command in beam-linkage-check.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434770
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 02:54
Start Date: 19/May/20 02:54
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #10946:
URL: https://github.com/apache/beam/pull/10946#issuecomment-630548345


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434770)
Time Spent: 6h 10m  (was: 6h)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=434768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434768
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 19/May/20 02:44
Start Date: 19/May/20 02:44
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on a change in pull request 
#11734:
URL: https://github.com/apache/beam/pull/11734#discussion_r426996114



##
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/task.md
##
@@ -0,0 +1,50 @@
+
+
+# GroupByKey
+
+GroupByKey is a Beam transform for processing collections of key/value pairs. 
It’s a parallel
+reduction operation, analogous to the Shuffle phase of a 
Map/Shuffle/Reduce-style algorithm. The
+input to GroupByKey is a collection of key/value pairs that represents a 
multimap, where the
+collection contains multiple pairs that have the same key, but different 
values. Given such a
+collection, you use GroupByKey to collect all of the values associated with 
each unique key.
+
+**Kata:** Implement a 
+https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey;>
+beam.GroupByKey transform that groups words by its first letter.
+
+  Refer to https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey;>
+  beam.GroupByKey to solve this problem.
+
+
+  Providing your ParDo a func with two return values, such as below, will 
transform a PCollectionB 
+  into a PCollectionKVA,B.
+  
+```
+func someFunc(element string) (uint8, string) {

Review comment:
   Thank you, Henry.  May we consider leaving the first return type to be 
uint8?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434768)
Time Spent: 50m  (was: 40m)

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  * Branching
>  * 
> [CoGroupByKey|[https://github.com/damondouglas/beam/tree/BEAM-9679-core-transform-groupbykey]]
>  * Combine
>  * Composite Transform
>  * DoFn Additional Parameters
>  * Flatten
>  * GroupByKey
>  * [Map|[https://github.com/apache/beam/pull/11564]]
>  * Partition
>  * Side Input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=434762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434762
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 19/May/20 02:36
Start Date: 19/May/20 02:36
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on a change in pull request 
#11734:
URL: https://github.com/apache/beam/pull/11734#discussion_r426996114



##
File path: learning/katas/go/Core Transforms/GroupByKey/GroupByKey/task.md
##
@@ -0,0 +1,50 @@
+
+
+# GroupByKey
+
+GroupByKey is a Beam transform for processing collections of key/value pairs. 
It’s a parallel
+reduction operation, analogous to the Shuffle phase of a 
Map/Shuffle/Reduce-style algorithm. The
+input to GroupByKey is a collection of key/value pairs that represents a 
multimap, where the
+collection contains multiple pairs that have the same key, but different 
values. Given such a
+collection, you use GroupByKey to collect all of the values associated with 
each unique key.
+
+**Kata:** Implement a 
+https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey;>
+beam.GroupByKey transform that groups words by its first letter.
+
+  Refer to https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#GroupByKey;>
+  beam.GroupByKey to solve this problem.
+
+
+  Providing your ParDo a func with two return values, such as below, will 
transform a PCollectionB 
+  into a PCollectionKVA,B.
+  
+```
+func someFunc(element string) (uint8, string) {

Review comment:
   Thank you, Henry.  May we consider leaving the first return type to be 
uint8?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434762)
Time Spent: 40m  (was: 0.5h)

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  * Branching
>  * 
> [CoGroupByKey|[https://github.com/damondouglas/beam/tree/BEAM-9679-core-transform-groupbykey]]
>  * Combine
>  * Composite Transform
>  * DoFn Additional Parameters
>  * Flatten
>  * GroupByKey
>  * [Map|[https://github.com/apache/beam/pull/11564]]
>  * Partition
>  * Side Input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9977?focusedWorklogId=434751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434751
 ]

ASF GitHub Bot logged work on BEAM-9977:


Author: ASF GitHub Bot
Created on: 19/May/20 02:15
Start Date: 19/May/20 02:15
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-630536199


   Latest changes are for addressing comments and using double during 
computation. @lukecwik PTAL. Thanks for your help!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434751)
Time Spent: 1h 10m  (was: 1h)

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10030) Add CSVIO for Java SDK

2020-05-18 Thread Saurabh Joshi (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Joshi updated BEAM-10030:
-
Description: 
Apache Beam has TextIO class which can read text based files line by line, 
delimited by either a carriage return, newline, or a carriage return and a 
newline. This approach does not support CSV files which have records that span 
multiple lines. This is because there could be fields where there is a newline 
inside the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{{{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
 {{Pipeline pipeline = Pipeline.create(options);}}
 {{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
 records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel, by splitting the records and distribute it to many workers? If so, 
would it be good to have a feature where Apache Beam supports reading/writing 
CSV files?

  was:
Apache Beam has TextIO class which can read text based files line by line, 
delimited by either a carriage return, newline, or a carriage return and a 
newline. This approach does not support CSV files which have records that span 
multiple lines. This is because there could be fields where there is a newline 
inside the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{{{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
 {{Pipeline pipeline = Pipeline.create(options);}}
 {{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
 records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel? If so, would it be good to have a feature where Apache Beam supports 
reading/writing CSV files?


> Add CSVIO for Java SDK
> --
>
> Key: BEAM-10030
> URL: https://issues.apache.org/jira/browse/BEAM-10030
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Saurabh Joshi
>Priority: P2
>
> Apache Beam has TextIO class which can read text based files line by line, 
> delimited by either a carriage return, newline, or a carriage return and a 
> newline. This approach does not support CSV files which have records that 
> span multiple lines. This is because there could be fields where there is a 
> newline inside the double quotes.
> This Stackoverflow question is relevant for a feature that should be added to 
> Apache Beam: 
> [https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]
> I can think of two libraries we could use for handling CSV files. The first 
> one is using Apache Commons CSV library. Here is some example code which can 
> use CSVRecord class for reading and writing CSV records:
> {{{color:#172b4d}{{PipelineOptions options = 
> PipelineOptionsFactory.create();}}
>  {{Pipeline pipeline = Pipeline.create(options);}}
>  {{PCollection records = pipeline.apply("ReadCSV", 
> CSVIO.read().from("input.csv"));}}
>  records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}
> Another library we could use is Jackson CSV, which allows users to specify 
> schemas for the columns: 
> [https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]
> The crux of the problem is this: can we read and write large CSV files in 
> parallel, by splitting the records and distribute it to many workers? If so, 
> would it be good to have a feature where Apache Beam supports reading/writing 
> CSV files?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10030) Add CSVIO for Java SDK

2020-05-18 Thread Saurabh Joshi (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Joshi updated BEAM-10030:
-
Description: 
Apache Beam has TextIO class which can read text based files line by line, 
delimited by either a carriage return, newline, or a carriage return and a 
newline. This approach does not support CSV files which have records that span 
multiple lines. This is because there could be fields where there is a newline 
inside the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{{{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
 {{Pipeline pipeline = Pipeline.create(options);}}
 {{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
 records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel? If so, would it be good to have a feature where Apache Beam supports 
reading/writing CSV files?

  was:
Apache Beam has TextIO which can read text based files line by line, delimited 
by either a carriage return, newline, or a carriage return and a newline. This 
approach does not support CSV files which have records that span multiple 
lines. This is because there could be fields where there is a newline inside 
the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{{{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
 {{Pipeline pipeline = Pipeline.create(options);}}
 {{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel? If so, would it be good to have a feature where Apache Beam supports 
reading/writing CSV files?


> Add CSVIO for Java SDK
> --
>
> Key: BEAM-10030
> URL: https://issues.apache.org/jira/browse/BEAM-10030
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Saurabh Joshi
>Priority: P2
>
> Apache Beam has TextIO class which can read text based files line by line, 
> delimited by either a carriage return, newline, or a carriage return and a 
> newline. This approach does not support CSV files which have records that 
> span multiple lines. This is because there could be fields where there is a 
> newline inside the double quotes.
> This Stackoverflow question is relevant for a feature that should be added to 
> Apache Beam: 
> [https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]
> I can think of two libraries we could use for handling CSV files. The first 
> one is using Apache Commons CSV library. Here is some example code which can 
> use CSVRecord class for reading and writing CSV records:
> {{{color:#172b4d}{{PipelineOptions options = 
> PipelineOptionsFactory.create();}}
>  {{Pipeline pipeline = Pipeline.create(options);}}
>  {{PCollection records = pipeline.apply("ReadCSV", 
> CSVIO.read().from("input.csv"));}}
>  records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}
> Another library we could use is Jackson CSV, which allows users to specify 
> schemas for the columns: 
> [https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]
> The crux of the problem is this: can we read and write large CSV files in 
> parallel? If so, would it be good to have a feature where Apache Beam 
> supports reading/writing CSV files?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10030) Add CSVIO for Java SDK

2020-05-18 Thread Saurabh Joshi (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Joshi updated BEAM-10030:
-
Description: 
Apache Beam has TextIO which can read text based files line by line, delimited 
by either a carriage return, newline, or a carriage return and a newline. This 
approach does not support CSV files which have records that span multiple 
lines. This is because there could be fields where there is a newline inside 
the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{{{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
 {{Pipeline pipeline = Pipeline.create(options);}}
 {{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel? If so, would it be good to have a feature where Apache Beam supports 
reading/writing CSV files?

  was:
Apache Beam has TextIO which can read text based files line by line, delimited 
by either a carriage return, newline, or a carriage return and a newline. This 
approach does not support CSV files which have records that span multiple 
lines. This is because there could be fields where there is a newline inside 
the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
{{Pipeline pipeline = Pipeline.create(options);}}
{{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
{{ records.apply("WriteCSV", CSVIO.write().to("output.csv"));}}{color}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel? If so, would it be good to have a feature where Apache Beam supports 
reading/writing CSV files?


> Add CSVIO for Java SDK
> --
>
> Key: BEAM-10030
> URL: https://issues.apache.org/jira/browse/BEAM-10030
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Saurabh Joshi
>Priority: P2
>
> Apache Beam has TextIO which can read text based files line by line, 
> delimited by either a carriage return, newline, or a carriage return and a 
> newline. This approach does not support CSV files which have records that 
> span multiple lines. This is because there could be fields where there is a 
> newline inside the double quotes.
> This Stackoverflow question is relevant for a feature that should be added to 
> Apache Beam: 
> [https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]
> I can think of two libraries we could use for handling CSV files. The first 
> one is using Apache Commons CSV library. Here is some example code which can 
> use CSVRecord class for reading and writing CSV records:
> {{{color:#172b4d}{{PipelineOptions options = 
> PipelineOptionsFactory.create();}}
>  {{Pipeline pipeline = Pipeline.create(options);}}
>  {{PCollection records = pipeline.apply("ReadCSV", 
> CSVIO.read().from("input.csv"));}}
> records.apply("WriteCSV", CSVIO.write().to("output.csv"));{color}}}
> Another library we could use is Jackson CSV, which allows users to specify 
> schemas for the columns: 
> [https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]
> The crux of the problem is this: can we read and write large CSV files in 
> parallel? If so, would it be good to have a feature where Apache Beam 
> supports reading/writing CSV files?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434741
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 19/May/20 01:40
Start Date: 19/May/20 01:40
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit merged pull request #11528:
URL: https://github.com/apache/beam/pull/11528


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434741)
Time Spent: 2h  (was: 1h 50m)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10030) Add CSVIO for Java SDK

2020-05-18 Thread Saurabh Joshi (Jira)

Saurabh Joshi created BEAM-10030:


 Summary: Add CSVIO for Java SDK
 Key: BEAM-10030
 URL: https://issues.apache.org/jira/browse/BEAM-10030
 Project: Beam
  Issue Type: New Feature
  Components: io-ideas
Reporter: Saurabh Joshi


Apache Beam has TextIO which can read text based files line by line, delimited 
by either a carriage return, newline, or a carriage return and a newline. This 
approach does not support CSV files which have records that span multiple 
lines. This is because there could be fields where there is a newline inside 
the double quotes.

This Stackoverflow question is relevant for a feature that should be added to 
Apache Beam: 
[https://stackoverflow.com/questions/51439189/how-to-read-large-csv-with-beam]

I can think of two libraries we could use for handling CSV files. The first one 
is using Apache Commons CSV library. Here is some example code which can use 
CSVRecord class for reading and writing CSV records:

{color:#172b4d}{{PipelineOptions options = PipelineOptionsFactory.create();}}
{{Pipeline pipeline = Pipeline.create(options);}}
{{PCollection records = pipeline.apply("ReadCSV", 
CSVIO.read().from("input.csv"));}}
{{ records.apply("WriteCSV", CSVIO.write().to("output.csv"));}}{color}

Another library we could use is Jackson CSV, which allows users to specify 
schemas for the columns: 
[https://github.com/FasterXML/jackson-dataformats-text/tree/master/csv]

The crux of the problem is this: can we read and write large CSV files in 
parallel? If so, would it be good to have a feature where Apache Beam supports 
reading/writing CSV files?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434725
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 19/May/20 01:12
Start Date: 19/May/20 01:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r426974548



##
File path: 
runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslatorTest.java
##
@@ -53,9 +53,12 @@
 import org.apache.beam.model.pipeline.v1.RunnerApi.DockerPayload;

Review comment:
   Shouldn't we have a test to show the artifacts were properly set?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434725)
Time Spent: 10h 40m  (was: 10.5h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434724
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 01:12
Start Date: 19/May/20 01:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11748:
URL: https://github.com/apache/beam/pull/11748#discussion_r426974286



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java
##
@@ -183,7 +183,10 @@ public JobSpecification translate(
 String workerHarnessContainerImageURL =
 
DataflowRunner.getContainerImageForJob(options.as(DataflowPipelineOptions.class));
 RunnerApi.Environment defaultEnvironmentForDataflow =
-Environments.createDockerEnvironment(workerHarnessContainerImageURL);
+Environments.createDockerEnvironment(workerHarnessContainerImageURL)
+.toBuilder()
+.addAllCapabilities(Environments.getJavaCapabilities())

Review comment:
   Is there some common utility we should be using here (rather than 
duplicating the code that's used in all the other portable runners)? What about 
requirements? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434724)
Time Spent: 8h  (was: 7h 50m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434723
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 01:12
Start Date: 19/May/20 01:12
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #10946:
URL: https://github.com/apache/beam/pull/10946#discussion_r426974365



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/TVFStreamingUtils.java
##
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.utils;
+
+/** Provides static constants or utils for TVF streaming. */
+public class TVFStreamingUtils {

Review comment:
   In that case, drop the constants all together. You can't reference a 
class in Beam from Calcite, and these constants are used in that class.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434723)
Time Spent: 6h  (was: 5h 50m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434722
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 19/May/20 01:11
Start Date: 19/May/20 01:11
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r426973945



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -784,7 +877,25 @@ public DataflowPipelineJob run(Pipeline pipeline) {
 "Executing pipeline on the Dataflow Service, which will have billing 
implications "
 + "related to Google Compute Engine usage and other Google Cloud 
Services.");
 
-List packages = options.getStager().stageDefaultFiles();
+// Capture the sdkComponents for look up during step translations
+SdkComponents sdkComponents = SdkComponents.create();
+
+DataflowPipelineOptions dataflowOptions = 
options.as(DataflowPipelineOptions.class);
+String workerHarnessContainerImageURL = 
DataflowRunner.getContainerImageForJob(dataflowOptions);
+RunnerApi.Environment defaultEnvironmentForDataflow =
+Environments.createDockerEnvironment(workerHarnessContainerImageURL);
+
+sdkComponents.registerEnvironment(
+defaultEnvironmentForDataflow
+.toBuilder()
+.addAllDependencies(getDefaultArtifacts())

Review comment:
   We also need to make sure we have the capabilities set, I have this PR: 
https://github.com/apache/beam/pull/11748 since it was missing before.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434722)
Time Spent: 10.5h  (was: 10h 20m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-18 Thread Robert Bradshaw (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110751#comment-17110751
 ] 

Robert Bradshaw commented on BEAM-9577:
---

If this is not taken care of by https://github.com/apache/beam/pull/11039 it 
will need to be a follow-up. 

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434721
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 19/May/20 01:10
Start Date: 19/May/20 01:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r426973945



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
##
@@ -784,7 +877,25 @@ public DataflowPipelineJob run(Pipeline pipeline) {
 "Executing pipeline on the Dataflow Service, which will have billing 
implications "
 + "related to Google Compute Engine usage and other Google Cloud 
Services.");
 
-List packages = options.getStager().stageDefaultFiles();
+// Capture the sdkComponents for look up during step translations
+SdkComponents sdkComponents = SdkComponents.create();
+
+DataflowPipelineOptions dataflowOptions = 
options.as(DataflowPipelineOptions.class);
+String workerHarnessContainerImageURL = 
DataflowRunner.getContainerImageForJob(dataflowOptions);
+RunnerApi.Environment defaultEnvironmentForDataflow =
+Environments.createDockerEnvironment(workerHarnessContainerImageURL);
+
+sdkComponents.registerEnvironment(
+defaultEnvironmentForDataflow
+.toBuilder()
+.addAllDependencies(getDefaultArtifacts())

Review comment:
   We also need to make sure we have the capabilities set, I have this PR: 
https://github.com/apache/beam/pull/11748





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434721)
Time Spent: 10h 20m  (was: 10h 10m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-18 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110748#comment-17110748
 ] 

Luke Cwik edited comment on BEAM-9577 at 5/19/20, 1:06 AM:
---

I found out that the DataflowPipelineTranslator does not use the 
getOrCreateDefaultEnvironment logic and its environment currently lacks all the 
artifact metadata currently.
See: 
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139

Is that covered by this task or is there another one this should be added to?


was (Author: lcwik):
I found out that the DataflowPipelineTranslator does not use the 
getOrCreateDefaultEnvironment logic and its environment currently lacks all the 
artifact metadata currently.
See: 
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-18 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110748#comment-17110748
 ] 

Luke Cwik commented on BEAM-9577:
-

I found out that the DataflowPipelineTranslator does not use the 
getOrCreateDefaultEnvironment logic and its environment currently lacks all the 
artifact metadata currently.
See: 
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110747#comment-17110747
 ] 

Luke Cwik commented on BEAM-9339:
-

Fix in https://github.com/apache/beam/pull/11748

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434718
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 01:03
Start Date: 19/May/20 01:03
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11748:
URL: https://github.com/apache/beam/pull/11748#issuecomment-630514195


   CC: @ananvay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434718)
Time Spent: 7h 50m  (was: 7h 40m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434717
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 19/May/20 01:03
Start Date: 19/May/20 01:03
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r426971999



##
File path: 
runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
##
@@ -195,7 +187,7 @@ public void testFileWithExtensionPackageNamingAndSize() 
throws Exception {
 PackageAttributes attr = makePackageAttributes(tmpFile, null);
 DataflowPackage target = attr.getDestination();
 
-assertThat(target.getName(), RegexMatcher.matches("file-" + HASH_PATTERN + 
".txt"));
+assertThat(target.getName(), RegexMatcher.matches(UUID_PATTERN + ".txt"));

Review comment:
   Hmm. You're right. UUID is implementation detail behind the uniqueness 
guarantee. I changed the tests to only check whether it keeps the same 
extension.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434717)
Time Spent: 10h 10m  (was: 10h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434714
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 01:03
Start Date: 19/May/20 01:03
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11748:
URL: https://github.com/apache/beam/pull/11748#issuecomment-630514089


   R: @y1chi @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434714)
Time Spent: 7.5h  (was: 7h 20m)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434715
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 01:03
Start Date: 19/May/20 01:03
Worklog Time Spent: 10m 
  Work Description: lukecwik edited a comment on pull request #11748:
URL: https://github.com/apache/beam/pull/11748#issuecomment-630514089


   R: @ihji  @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434715)
Time Spent: 7h 40m  (was: 7.5h)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?focusedWorklogId=434713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434713
 ]

ASF GitHub Bot logged work on BEAM-9339:


Author: ASF GitHub Bot
Created on: 19/May/20 01:02
Start Date: 19/May/20 01:02
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11748:
URL: https://github.com/apache/beam/pull/11748


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build

[jira] [Comment Edited] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110744#comment-17110744
 ] 

Luke Cwik edited comment on BEAM-9339 at 5/19/20, 1:01 AM:
---

Dataflow doesn't specify the capabilities in its pipeline proto representation 
since it isn't using the default getOrCreate environment call.

See:
https://github.com/apache/beam/blob/76fbe45189ef0fa4b770d607c2f86e8870974523/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L139


was (Author: lcwik):
Dataflow doesn't specify the capabilities in its pipeline proto representation 
since it isn't using the default getOrCreate environment call.

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110744#comment-17110744
 ] 

Luke Cwik commented on BEAM-9339:
-

Dataflow doesn't specify the capabilities in its pipeline proto representation 
since it isn't using the default getOrCreate environment call.

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9339:

Status: Open  (was: Triage Needed)

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-9339:

Fix Version/s: (was: 2.21.0)
   2.22.0

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (BEAM-9339) Declare capabilities in SDK environments

2020-05-18 Thread Luke Cwik (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reopened BEAM-9339:
-

> Declare capabilities in SDK environments
> 
>
> Key: BEAM-9339
> URL: https://issues.apache.org/jira/browse/BEAM-9339
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
> Fix For: 2.21.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434709
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 00:53
Start Date: 19/May/20 00:53
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #10946:
URL: https://github.com/apache/beam/pull/10946#discussion_r426969362



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import static 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument;
+
+import java.lang.reflect.Type;
+import java.util.List;
+import java.util.Set;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableFunctionScan;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelColumnMapping;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import org.joda.time.Duration;
+
+/**
+ * BeamRelNode to replace {@code TableFunctionScan}. Currently this class 
limits to support
+ * table-valued function for streaming windowing.
+ */
+public class BeamTableFunctionScanRel extends TableFunctionScan implements 
BeamRelNode {
+  public BeamTableFunctionScanRel(
+  RelOptCluster cluster,
+  RelTraitSet traitSet,
+  List inputs,
+  RexNode rexCall,
+  Type elementType,
+  RelDataType rowType,
+  Set columnMappings) {
+super(cluster, traitSet, inputs, rexCall, elementType, rowType, 
columnMappings);
+  }
+
+  @Override
+  public TableFunctionScan copy(
+  RelTraitSet traitSet,
+  List list,
+  RexNode rexNode,
+  Type type,
+  RelDataType relDataType,
+  Set set) {
+return new BeamTableFunctionScanRel(
+getCluster(), traitSet, list, rexNode, type, relDataType, 
columnMappings);
+  }
+
+  @Override
+  public PTransform, PCollection> buildPTransform() {
+return new Transform();
+  }
+
+  private class Transform extends PTransform, 
PCollection> {
+
+@Override
+public PCollection expand(PCollectionList input) {
+  checkArgument(
+  input.size() == 1,
+  "Wrong number of inputs for %s, expected 1 input but received: %s",
+  BeamTableFunctionScanRel.class.getSimpleName(),
+  input);
+  String operatorName = ((RexCall) getCall()).getOperator().getName();
+  checkArgument(
+  operatorName.equals("TUMBLE"),
+

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434708
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 00:50
Start Date: 19/May/20 00:50
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #10946:
URL: https://github.com/apache/beam/pull/10946#discussion_r426968398



##
File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java
##
@@ -175,6 +185,37 @@ private void addBuiltinFunctionsToCatalog(SimpleCatalog 
catalog, AnalyzerOptions
 Mode.SCALAR,
 ImmutableList.of(resolvedFunc.getSignature(
 .forEach(catalog::addFunction);
+
+FunctionArgumentType retType =
+new FunctionArgumentType(SignatureArgumentKind.ARG_TYPE_RELATION);
+
+FunctionArgumentType inputTableType =
+new FunctionArgumentType(SignatureArgumentKind.ARG_TYPE_RELATION);
+
+FunctionArgumentType descriptorType =
+new FunctionArgumentType(
+SignatureArgumentKind.ARG_TYPE_DESCRIPTOR,
+FunctionArgumentTypeOptionsProto.newBuilder()
+.setDescriptorResolutionTableOffset(0)
+.build(),
+1);
+
+FunctionArgumentType stringType =
+new 
FunctionArgumentType(TypeFactory.createSimpleType(TypeKind.TYPE_STRING));
+
+// TUMBLE
+catalog.addTableValuedFunction(
+new 
TableValuedFunction.ForwardInputSchemaToOutputSchemaWithAppendedColumnTVF(

Review comment:
   Yeah. And at least I am glad this name itself does not exceed 80 chars 
so it at least fits into a single line..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434708)
Time Spent: 5h 40m  (was: 5.5h)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=434705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434705
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 19/May/20 00:48
Start Date: 19/May/20 00:48
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11437:
URL: https://github.com/apache/beam/pull/11437#issuecomment-630510056


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434705)
Time Spent: 5h  (was: 4h 50m)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: P4
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9958) Linkage Checker to use exclusion file as baseline

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9958?focusedWorklogId=434704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434704
 ]

ASF GitHub Bot logged work on BEAM-9958:


Author: ASF GitHub Bot
Created on: 19/May/20 00:48
Start Date: 19/May/20 00:48
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11674:
URL: https://github.com/apache/beam/pull/11674#issuecomment-630510006


   Running the tests. Does any test, actually test this code?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434704)
Time Spent: 1h  (was: 50m)

> Linkage Checker to use exclusion file as baseline
> -
>
> Key: BEAM-9958
> URL: https://issues.apache.org/jira/browse/BEAM-9958
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Linkage Checker to use exclusion file as baseline.
> Linkage Checker 1.4.0 has function to take exclusion file to filter out 
> existing linkage errors. This functionality eliminates the need of running 
> diff command in beam-linkage-check.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9958) Linkage Checker to use exclusion file as baseline

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9958?focusedWorklogId=434703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434703
 ]

ASF GitHub Bot logged work on BEAM-9958:


Author: ASF GitHub Bot
Created on: 19/May/20 00:47
Start Date: 19/May/20 00:47
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11674:
URL: https://github.com/apache/beam/pull/11674#issuecomment-630509865


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434703)
Time Spent: 50m  (was: 40m)

> Linkage Checker to use exclusion file as baseline
> -
>
> Key: BEAM-9958
> URL: https://issues.apache.org/jira/browse/BEAM-9958
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Linkage Checker to use exclusion file as baseline.
> Linkage Checker 1.4.0 has function to take exclusion file to filter out 
> existing linkage errors. This functionality eliminates the need of running 
> diff command in beam-linkage-check.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434701
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 00:46
Start Date: 19/May/20 00:46
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #10946:
URL: https://github.com/apache/beam/pull/10946#discussion_r426967410



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/utils/TVFStreamingUtils.java
##
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.utils;
+
+/** Provides static constants or utils for TVF streaming. */
+public class TVFStreamingUtils {

Review comment:
   `SqlWindowTableFunction` is a class in Calcite (since 1.22.0). After we 
successfully upgrade to newer version of Calcite (I hope), we can remove 
`SqlWindowTableFunction`, thus there is a need to keep a `TVFStreamingUtils`.
   
   There could be an argument though that such constants can be put into  
`SqlWindowTableFunction` in Calcite. We can leave such discussion in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434701)
Time Spent: 5.5h  (was: 5h 20m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434699
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 00:43
Start Date: 19/May/20 00:43
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on a change in pull request #10946:
URL: https://github.com/apache/beam/pull/10946#discussion_r426966749



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import static 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument;
+
+import java.lang.reflect.Type;
+import java.util.List;
+import java.util.Set;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.windowing.FixedWindows;
+import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.TableFunctionScan;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelColumnMapping;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLiteral;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import org.joda.time.Duration;
+
+/**
+ * BeamRelNode to replace {@code TableFunctionScan}. Currently this class 
limits to support
+ * table-valued function for streaming windowing.
+ */
+public class BeamTableFunctionScanRel extends TableFunctionScan implements 
BeamRelNode {
+  public BeamTableFunctionScanRel(
+  RelOptCluster cluster,
+  RelTraitSet traitSet,
+  List inputs,
+  RexNode rexCall,
+  Type elementType,
+  RelDataType rowType,
+  Set columnMappings) {
+super(cluster, traitSet, inputs, rexCall, elementType, rowType, 
columnMappings);
+  }
+
+  @Override
+  public TableFunctionScan copy(
+  RelTraitSet traitSet,
+  List list,
+  RexNode rexNode,
+  Type type,
+  RelDataType relDataType,
+  Set set) {
+return new BeamTableFunctionScanRel(
+getCluster(), traitSet, list, rexNode, type, relDataType, 
columnMappings);

Review comment:
   Good catch! Replaced with the right parameter. 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434699)
Time Spent: 5h 20m  (was: 5h 10m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
>

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=434693=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434693
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 19/May/20 00:01
Start Date: 19/May/20 00:01
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #10946:
URL: https://github.com/apache/beam/pull/10946#discussion_r426949818



##
File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/TVFScanConverter.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.zetasql.translation;
+
+import com.google.zetasql.resolvedast.ResolvedNode;
+import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedTVFArgument;
+import com.google.zetasql.resolvedast.ResolvedNodes.ResolvedTVFScan;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.logical.LogicalTableFunctionScan;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeFieldImpl;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeName;
+
+class TVFScanConverter extends RelConverter {
+
+  TVFScanConverter(ConversionContext context) {
+super(context);
+  }
+
+  @Override
+  public RelNode convert(ResolvedTVFScan zetaNode, List inputs) {
+RelNode input = inputs.get(0);
+RelNode tableFunctionScan =
+LogicalTableFunctionScan.create(
+getCluster(),
+inputs,
+getExpressionConverter()
+.convertTableValuedFunction(
+input,
+zetaNode.getTvf(),
+zetaNode.getArgumentList(),
+
zetaNode.getArgumentList().get(0).getScan().getColumnList()),
+null,
+createRowTypeWithWindowStartAndEnd(input.getRowType()),
+Collections.EMPTY_SET);
+
+return tableFunctionScan;
+  }
+
+  @Override
+  public List getInputs(ResolvedTVFScan zetaNode) {
+List inputs = new ArrayList();
+for (ResolvedTVFArgument argument : zetaNode.getArgumentList()) {
+  if (argument.getScan() != null) {
+inputs.add(argument.getScan());
+  }
+}
+return inputs;
+  }
+
+  private RelDataType createRowTypeWithWindowStartAndEnd(RelDataType 
inputRowType) {
+List newFields = new 
ArrayList<>(inputRowType.getFieldList());
+RelDataType timestampType = 
getCluster().getTypeFactory().createSqlType(SqlTypeName.TIMESTAMP);
+
+RelDataTypeField windowStartField =
+new RelDataTypeFieldImpl("window_start", newFields.size(), 
timestampType);

Review comment:
   Use the constant?

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434691
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:59
Start Date: 18/May/20 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11570:
URL: https://github.com/apache/beam/pull/11570#issuecomment-630495763


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434691)
Time Spent: 3h 40m  (was: 3.5h)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434692
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:59
Start Date: 18/May/20 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11532:
URL: https://github.com/apache/beam/pull/11532#issuecomment-630495848


   Looks like you need to run spotless to auto-format. You can use `./gradlew 
spotlessApply` to do that locally (may need to do it on the other PRs as well)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434692)
Time Spent: 3h 50m  (was: 3h 40m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434690
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:57
Start Date: 18/May/20 23:57
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11529:
URL: https://github.com/apache/beam/pull/11529#issuecomment-630495233


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434690)
Time Spent: 3.5h  (was: 3h 20m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434689
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 23:56
Start Date: 18/May/20 23:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11528:
URL: https://github.com/apache/beam/pull/11528#issuecomment-630494999


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434689)
Time Spent: 1h 50m  (was: 1h 40m)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434688
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:56
Start Date: 18/May/20 23:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11532:
URL: https://github.com/apache/beam/pull/11532#issuecomment-630494893


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434688)
Time Spent: 3h 20m  (was: 3h 10m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434687
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 23:55
Start Date: 18/May/20 23:55
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11528:
URL: https://github.com/apache/beam/pull/11528#issuecomment-630494783


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434687)
Time Spent: 1h 40m  (was: 1.5h)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434684
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 23:46
Start Date: 18/May/20 23:46
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11528:
URL: https://github.com/apache/beam/pull/11528#issuecomment-630492150


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434684)
Time Spent: 1h 20m  (was: 1h 10m)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434685
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 23:46
Start Date: 18/May/20 23:46
Worklog Time Spent: 10m 
  Work Description: nielm removed a comment on pull request #11528:
URL: https://github.com/apache/beam/pull/11528#issuecomment-630469413


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434685)
Time Spent: 1.5h  (was: 1h 20m)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434680
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:45
Start Date: 18/May/20 23:45
Worklog Time Spent: 10m 
  Work Description: nielm removed a comment on pull request #11529:
URL: https://github.com/apache/beam/pull/11529#issuecomment-630476722


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434680)
Time Spent: 2h 40m  (was: 2.5h)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434681
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:45
Start Date: 18/May/20 23:45
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11529:
URL: https://github.com/apache/beam/pull/11529#issuecomment-630491891


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434681)
Time Spent: 2h 50m  (was: 2h 40m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434682
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:45
Start Date: 18/May/20 23:45
Worklog Time Spent: 10m 
  Work Description: nielm removed a comment on pull request #11532:
URL: https://github.com/apache/beam/pull/11532#issuecomment-630472990


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434682)
Time Spent: 3h  (was: 2h 50m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434683
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:45
Start Date: 18/May/20 23:45
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11532:
URL: https://github.com/apache/beam/pull/11532#issuecomment-630492055


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434683)
Time Spent: 3h 10m  (was: 3h)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434679
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:44
Start Date: 18/May/20 23:44
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11570:
URL: https://github.com/apache/beam/pull/11570#issuecomment-630491802


   Retest this please
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434679)
Time Spent: 2.5h  (was: 2h 20m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434674
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:33
Start Date: 18/May/20 23:33
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11570:
URL: https://github.com/apache/beam/pull/11570#issuecomment-630488670


   @allenpradeep 
   > 1. What mode should our import pipeline use? Should it use option b as 
data in AVRO seems already sorted?
   
   We can discuss this outside the scope of this PR. 
   
   > 2. Where should we document these modes of operation so that some customer 
can use these?
   
   I have added a section to the javadoc explaining these 3 modes of operation, 
and their pros and cons.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434674)
Time Spent: 2h 20m  (was: 2h 10m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434673
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 23:33
Start Date: 18/May/20 23:33
Worklog Time Spent: 10m 
  Work Description: nielm commented on a change in pull request #11570:
URL: https://github.com/apache/beam/pull/11570#discussion_r426947630



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
##
@@ -1171,67 +1145,127 @@ public void processElement(ProcessContext c) {
* occur, Therefore this DoFn has to be tested in isolation.
*/
   @VisibleForTesting
-  static class GatherBundleAndSortFn extends DoFn>> {
-private final long maxBatchSizeBytes;
-private final long maxNumMutations;
-private final long maxNumRows;
-
-// total size of the current batch.
-private long batchSizeBytes;
-// total number of mutated cells.
-private long batchCells;
-// total number of rows mutated.
-private long batchRows;
+  static class GatherSortCreateBatchesFn extends DoFn> {
 
+private final long maxBatchSizeBytes;
+private final long maxBatchNumMutations;
+private final long maxBatchNumRows;
+private final long maxSortableSizeBytes;
+private final long maxSortableNumMutations;
+private final long maxSortableNumRows;
 private final PCollectionView schemaView;
+private final ArrayList mutationsToSort = new 
ArrayList<>();
 
-private transient ArrayList> mutationsToSort = null;
+// total size of MutationGroups in mutationsToSort.
+private long sortableSizeBytes;
+// total number of mutated cells in mutationsToSort
+private long sortableNumCells;
+// total number of rows mutated in mutationsToSort
+private long sortableNumRows;
 
-GatherBundleAndSortFn(
+GatherSortCreateBatchesFn(
 long maxBatchSizeBytes,
 long maxNumMutations,
 long maxNumRows,
 long groupingFactor,
 PCollectionView schemaView) {
-  this.maxBatchSizeBytes = maxBatchSizeBytes * groupingFactor;
-  this.maxNumMutations = maxNumMutations * groupingFactor;
-  this.maxNumRows = maxNumRows * groupingFactor;
+  this.maxBatchSizeBytes = maxBatchSizeBytes;
+  this.maxBatchNumMutations = maxNumMutations;
+  this.maxBatchNumRows = maxNumRows;
+
+  if (groupingFactor <= 0) {
+groupingFactor = 1;
+  }
+
+  this.maxSortableSizeBytes = maxBatchSizeBytes * groupingFactor;
+  this.maxSortableNumMutations = maxNumMutations * groupingFactor;
+  this.maxSortableNumRows = maxNumRows * groupingFactor;
   this.schemaView = schemaView;
 }
 
 @StartBundle
 public synchronized void startBundle() throws Exception {
-  if (mutationsToSort == null) {
-initSorter();
-  } else {
-throw new IllegalStateException("Sorter should be null here");
-  }
+  initSorter();
 }
 
-private void initSorter() {
-  mutationsToSort = new ArrayList>((int) 
maxNumMutations);
-  batchSizeBytes = 0;
-  batchCells = 0;
-  batchRows = 0;
+private synchronized void initSorter() {

Review comment:
   > Do we need to mark this as synchronized. Looks like all the callers 
are synchronized themselves.
   
   Probably not, but it does not harm.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434673)
Time Spent: 2h 10m  (was: 2h)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434669
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 18/May/20 23:28
Start Date: 18/May/20 23:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-630487229


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434669)
Time Spent: 10h  (was: 9h 50m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9698) BeamUncollectRel UncollectDoFn NullPointerException

2020-05-18 Thread Andrew Pilloud (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110711#comment-17110711
 ] 

Andrew Pilloud commented on BEAM-9698:
--

Fixing the first issue exposed a second. Seems we are dropping a struct.
```
Expected: ARRAY>>[]
Actual: ARRAY>[]

SELECT e FROM UNNEST(CAST(NULL AS ARRAY>)) e
```

> BeamUncollectRel UncollectDoFn NullPointerException
> ---
>
> Key: BEAM-9698
> URL: https://issues.apache.org/jira/browse/BEAM-9698
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: P2
>  Labels: zetasql-compliance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> two failures in shard 19
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NullPointerException
>   at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
>   at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamEnumerableConverter.runCollector(BeamEnumerableConverter.java:201)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamEnumerableConverter.collectRows(BeamEnumerableConverter.java:218)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamEnumerableConverter.toRowList(BeamEnumerableConverter.java:150)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamEnumerableConverter.toRowList(BeamEnumerableConverter.java:127)
>   at 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl.executeQuery(ExecuteQueryServiceServer.java:329)
>   at 
> com.google.zetasql.testing.SqlComplianceServiceGrpc$MethodHandlers.invoke(SqlComplianceServiceGrpc.java:423)
>   at 
> com.google.zetasql.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
>   at 
> com.google.zetasql.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
>   at 
> com.google.zetasql.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:711)
>   at 
> com.google.zetasql.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> com.google.zetasql.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamUncollectRel$UncollectDoFn.process(BeamUncollectRel.java:103)
> {code}
> {code}
> Apr 01, 2020 5:58:27 PM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT e FROM UNNEST(CAST(NULL AS 
> ARRAY)) e
> Apr 01, 2020 5:58:27 PM 
> cloud.dataflow.sql.ExecuteQueryServiceServer$SqlComplianceServiceImpl 
> executeQuery
> INFO: Processing Sql statement: SELECT e FROM UNNEST(CAST(NULL AS 
> ARRAY>)) e
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-05-18 Thread Brian Hulette (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8828:

Description: It should be possible to set BigQueryIO's 
[writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
 in a Beam SQL big query table.  (was: It should be possible to set 
BigQueryIO's 
[writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
 in a big query table.)

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a Beam SQL big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-05-18 Thread Brian Hulette (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8828:

Status: Open  (was: Triage Needed)

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Priority: P2
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=434662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434662
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 18/May/20 23:04
Start Date: 18/May/20 23:04
Worklog Time Spent: 10m 
  Work Description: lostluck commented on a change in pull request #11747:
URL: https://github.com/apache/beam/pull/11747#discussion_r426935903



##
File path: sdks/go/pkg/beam/io/synthetic/source.go
##
@@ -135,27 +155,79 @@ func (fn *sourceFn) ProcessElement(rt 
*offsetrange.Tracker, config SourceConfig,
return nil
 }
 
-// DefaultSourceConfig creates a SourceConfig with intended defaults for its
-// fields. SourceConfigs should be initialized with this method.
-func DefaultSourceConfig() SourceConfig {
-   return SourceConfig{
-   NumElements:   1, // Defaults shouldn't drop elements, so at 
least 1.
-   InitialSplits: 1, // Defaults to 1, i.e. no initial splitting.
+// SourceConfigBuilder is used to initialize SourceConfigs. See
+// SourceConfigBuilder's methods for descriptions of the fields in a
+// SourceConfig and how they can be set. The intended approach for using this
+// builder is to begin by calling the DefaultSourceConfig function, followed by
+// calling setters, followed by calling Build.
+//
+// Usage example:
+//
+//cfg := 
synthetic.DefaultSourceConfig().NumElements(5000).InitialSplits(2).Build()
+type SourceConfigBuilder struct {
+   cfg SourceConfig
+}
+
+// DefaultSourceConfig creates a SourceConfigBuilder set with intended defaults
+// for the SourceConfig fields. This function is the intended starting point 
for
+// initializing a SourceConfig and should always be used to create
+// SourceConfigBuilders.
+//
+// To see descriptions of the various SourceConfig fields and their defaults,
+// see the methods to SourceConfigBuilder.
+func DefaultSourceConfig() *SourceConfigBuilder {
+   return {
+   cfg: SourceConfig{
+   numElements:   1, // 0 is invalid (drops elements).
+   initialSplits: 1, // 0 is invalid (drops elements).
+   },
+   }
+}
+
+// NumElements is the number of elements for the source to generate and emit.
+//
+// Valid values are in the range of [1, ...] and the default value is 1. Values
+// of 0 (and below) are invalid as they result in sources that emit no 
elements.
+func (b *SourceConfigBuilder) NumElements(val int) *SourceConfigBuilder {
+   b.cfg.numElements = val
+   return b
+}
+
+// InitialSplits determines the number of initial splits to perform in the
+// source's SplitRestriction method. Restrictions in synthetic sources 
represent
+// the number of elements being emitted, and this split is performed evenly
+// across that number of elements.
+//
+// Each resulting restriction will have at least 1 element in it, and each
+// element being emitted will be contained in exactly one restriction. That
+// means that if the desired number of splits is greater than the number of
+// elements N, then N initial restrictions will be created, each containing 1
+// element.
+//
+// Valid values are in the range of [1, ...] and the default value is 1. Values
+// of 0 (and below) are invalid as they would result in dropping elements that
+// are expected to be emitted.
+func (b *SourceConfigBuilder) InitialSplits(val int) *SourceConfigBuilder {
+   b.cfg.initialSplits = val
+   return b
+}
+
+// Build constructs the SourceConfig initialized by this builder. It also
+// performs error checking on the fields, and panics if any have been set to
+// invalid values.
+func (b *SourceConfigBuilder) Build() SourceConfig {
+   if b.cfg.initialSplits <= 0 {
+   panic(fmt.Sprintf("SourceConfig.InitialSplits must be >= 1. 
Got: %v", b.cfg.initialSplits))
+   }
+   if b.cfg.numElements <= 0 {
+   panic(fmt.Sprintf("SourceConfig.NumElements must be >= 1. Got: 
%v", b.cfg.numElements))
}
+   return b.cfg
 }
 
 // SourceConfig is a struct containing all the configuration options for a
-// synthetic source.
+// synthetic source. It should be created via a SourceConfigBuilder.
 type SourceConfig struct {
-   // NumElements is the number of elements for the source to generate and
-   // emit.
-   NumElements int
-
-   // InitialSplits determines the number of initial splits to perform in 
the
-   // source's SplitRestriction method. Note that in some edge cases, the
-   // number of splits performed might differ from this config value. Each
-   // restriction will always have one element in it, and at least one
-   // restriction will always be output, so the number of splits will be in
-   // the range of [1, N] where N is the size of the original restriction.
-

[jira] [Work logged] (BEAM-9723) [Java] PTransform that connects to Cloud DLP deidentification service

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9723?focusedWorklogId=434661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434661
 ]

ASF GitHub Bot logged work on BEAM-9723:


Author: ASF GitHub Bot
Created on: 18/May/20 23:03
Start Date: 18/May/20 23:03
Worklog Time Spent: 10m 
  Work Description: tysonjh commented on a change in pull request #11566:
URL: https://github.com/apache/beam/pull/11566#discussion_r426831619



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/BatchRequestForDLP.java
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.privacy.dlp.v2.Table;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.StateSpec;
+import org.apache.beam.sdk.state.StateSpecs;
+import org.apache.beam.sdk.state.TimeDomain;
+import org.apache.beam.sdk.state.Timer;
+import org.apache.beam.sdk.state.TimerSpec;
+import org.apache.beam.sdk.state.TimerSpecs;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.values.KV;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * DoFn batching the input PCollection into bigger requests in order to better 
utilize the Cloud DLP
+ * service.
+ */
+@Experimental
+class BatchRequestForDLP extends DoFn, KV>> {
+  public static final Logger LOG = 
LoggerFactory.getLogger(BatchRequestForDLP.class);
+
+  private final Counter numberOfRowsBagged =
+  Metrics.counter(BatchRequestForDLP.class, "numberOfRowsBagged");
+  private final Integer batchSize;
+
+  @StateId("elementsBag")
+  private final StateSpec>> elementsBag = 
StateSpecs.bag();
+
+  @TimerId("eventTimer")
+  private final TimerSpec eventTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME);
+
+  public BatchRequestForDLP(Integer batchSize) {
+this.batchSize = batchSize;
+  }
+
+  @ProcessElement
+  public void process(
+  @Element KV element,
+  @StateId("elementsBag") BagState> elementsBag,
+  @TimerId("eventTimer") Timer eventTimer,
+  BoundedWindow w) {
+elementsBag.add(element);
+eventTimer.set(w.maxTimestamp());
+  }
+
+  @OnTimer("eventTimer")
+  public void onTimer(
+  @StateId("elementsBag") BagState> elementsBag,
+  OutputReceiver>> output) {
+String key = elementsBag.read().iterator().next().getKey();

Review comment:
   Is there a guarantee that at least one element will be in the 
elementsBag iterator or is there a chance for a NoSuchElementsException on the 
`next()` call?

##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.dlp.v2.DlpServiceClient;
+import com.google.privacy.dlp.v2.ContentItem;
+import com.google.privacy.dlp.v2.DeidentifyConfig;
+import

[jira] [Work logged] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?focusedWorklogId=434660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434660
 ]

ASF GitHub Bot logged work on BEAM-9984:


Author: ASF GitHub Bot
Created on: 18/May/20 22:57
Start Date: 18/May/20 22:57
Worklog Time Spent: 10m 
  Work Description: omarismail94 commented on a change in pull request 
#11737:
URL: https://github.com/apache/beam/pull/11737#discussion_r426935461



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java
##
@@ -347,4 +357,35 @@ public BigDecimal toBigDecimal(BigDecimal record) {
   return record;
 }
   }
+
+  static class BitOr extends CombineFn 
{
+static class Accum {

Review comment:
   Actually, this might work, let me test this
   ```
 static class BitOr extends CombineFn {
   @Override
   public Long createAccumulator() {
 return 0L;
   }
   
   @Override
   public Long addInput(Long accum, T input) {
 return accum | input.longValue();
   }
   
   @Override
   public Long mergeAccumulators(Iterable accums) {
 Long merged = createAccumulator();
 for (Long accum : accums) {
   merged = merged | accum;
 }
 return merged;
   }
   
   @Override
   public Long extractOutput(Long accum) {
 return accum;
   }
 }
   }
   ```

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java
##
@@ -347,4 +357,35 @@ public BigDecimal toBigDecimal(BigDecimal record) {
   return record;
 }
   }
+
+  static class BitOr extends CombineFn 
{
+static class Accum {

Review comment:
   It worked! Will commit this now!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434660)
Time Spent: 2h 10m  (was: 2h)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Omar Ismail
>Priority: P2
>  Labels: starter
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434659=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434659
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 22:54
Start Date: 18/May/20 22:54
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11529:
URL: https://github.com/apache/beam/pull/11529#issuecomment-630476722


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434659)
Time Spent: 2h  (was: 1h 50m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-7246) Add Google Spanner IO on Python SDK

2020-05-18 Thread Shehzaad Nakhoda (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-7246:
---
Issue Type: New Feature  (was: Bug)

> Add Google Spanner IO on Python SDK 
> 
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: New Feature
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shoaib Zafar
>Priority: P2
>  Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?focusedWorklogId=434657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434657
 ]

ASF GitHub Bot logged work on BEAM-9984:


Author: ASF GitHub Bot
Created on: 18/May/20 22:43
Start Date: 18/May/20 22:43
Worklog Time Spent: 10m 
  Work Description: omarismail94 commented on a change in pull request 
#11737:
URL: https://github.com/apache/beam/pull/11737#discussion_r426931476



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java
##
@@ -347,4 +357,35 @@ public BigDecimal toBigDecimal(BigDecimal record) {
   return record;
 }
   }
+
+  static class BitOr extends CombineFn 
{
+static class Accum {

Review comment:
   Hmm, Im not sure if this works. How would I create the accumulator? I 
can't do `new Long()`. That's why I wrapped `long` in Accum





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434657)
Time Spent: 2h  (was: 1h 50m)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Omar Ismail
>Priority: P2
>  Labels: starter
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434656
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 22:42
Start Date: 18/May/20 22:42
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11532:
URL: https://github.com/apache/beam/pull/11532#issuecomment-630472990


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434656)
Time Spent: 1h 50m  (was: 1h 40m)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9577) Update artifact staging and retrieval protocols to be dependency aware.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9577?focusedWorklogId=434654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434654
 ]

ASF GitHub Bot logged work on BEAM-9577:


Author: ASF GitHub Bot
Created on: 18/May/20 22:39
Start Date: 18/May/20 22:39
Worklog Time Spent: 10m 
  Work Description: robertwb merged pull request #11708:
URL: https://github.com/apache/beam/pull/11708


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434654)
Time Spent: 23h  (was: 22h 50m)

> Update artifact staging and retrieval protocols to be dependency aware.
> ---
>
> Key: BEAM-9577
> URL: https://issues.apache.org/jira/browse/BEAM-9577
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 23h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9822) SpannerIO: Reduce memory usage - especially when streaming

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9822?focusedWorklogId=434653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434653
 ]

ASF GitHub Bot logged work on BEAM-9822:


Author: ASF GitHub Bot
Created on: 18/May/20 22:38
Start Date: 18/May/20 22:38
Worklog Time Spent: 10m 
  Work Description: nielm commented on a change in pull request #11532:
URL: https://github.com/apache/beam/pull/11532#discussion_r426929817



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
##
@@ -1066,7 +1079,12 @@ public SpannerWriteResult 
expand(PCollection input) {
   spec.getBatchSizeBytes(),
   spec.getMaxNumMutations(),
   spec.getMaxNumRows(),
-  spec.getGroupingFactor(),
+  // Do not group on streaming unless explicitly 
set.
+  spec.getGroupingFactor()
+  .orElse(
+  input.isBounded() == IsBounded.BOUNDED

Review comment:
   it's kinda both!
   If the source is unbounded (streaming) -  and the groupingFactor has not 
been specified by the user, then default to no grouping.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434653)
Time Spent: 1h 40m  (was: 1.5h)

> SpannerIO: Reduce memory usage - especially when streaming
> --
>
> Key: BEAM-9822
> URL: https://issues.apache.org/jira/browse/BEAM-9822
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P2
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SpannerIO uses a lot of memory. 
> In Streaming Dataflow, it uses many times as much (because dataflow creates 
> many worker threads)
> Lower the memory use, and change default parameters during streaming to use 
> smaller batches and disable grouping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434650
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 22:33
Start Date: 18/May/20 22:33
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #11528:
URL: https://github.com/apache/beam/pull/11528#issuecomment-630470331


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434650)
Time Spent: 1h 10m  (was: 1h)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9958) Linkage Checker to use exclusion file as baseline

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9958?focusedWorklogId=434649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434649
 ]

ASF GitHub Bot logged work on BEAM-9958:


Author: ASF GitHub Bot
Created on: 18/May/20 22:33
Start Date: 18/May/20 22:33
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #11674:
URL: https://github.com/apache/beam/pull/11674#issuecomment-630470049


   @aaltay Thank you for taking review. PTAL.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434649)
Time Spent: 40m  (was: 0.5h)

> Linkage Checker to use exclusion file as baseline
> -
>
> Key: BEAM-9958
> URL: https://issues.apache.org/jira/browse/BEAM-9958
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Linkage Checker to use exclusion file as baseline.
> Linkage Checker 1.4.0 has function to take exclusion file to filter out 
> existing linkage errors. This functionality eliminates the need of running 
> diff command in beam-linkage-check.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9958) Linkage Checker to use exclusion file as baseline

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9958?focusedWorklogId=434648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434648
 ]

ASF GitHub Bot logged work on BEAM-9958:


Author: ASF GitHub Bot
Created on: 18/May/20 22:32
Start Date: 18/May/20 22:32
Worklog Time Spent: 10m 
  Work Description: suztomo commented on a change in pull request #11674:
URL: https://github.com/apache/beam/pull/11674#discussion_r426927838



##
File path: sdks/java/build-tools/beam-linkage-check.sh
##
@@ -66,51 +66,61 @@ if [ ! -z "$(git diff)" ]; then
   exit 1
 fi
 
+ACCUMULATED_RESULT=0
+
 function runLinkageCheck () {
   COMMIT=$1
   BRANCH=$2
+  MODE=$3 # baseline or validate
   # An empty invocation so that the subsequent checkJavaLinkage does not
   # contain garbage
   echo "`date`:" "Installing artifacts of ${BRANCH}(${COMMIT}) to Maven local 
repository."
-  ./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-sdks-java-core 
:checkJavaLinkage > /dev/null 2>&1
+  ./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-sdks-java-core 
-PjavaLinkageWriteBaseline=/dev/null :checkJavaLinkage > /dev/null 2>&1
   for ARTIFACT in $ARTIFACTS; do
-echo "`date`:" "Running linkage check for ${ARTIFACT} in ${BRANCH}"
-# Removing time taken to have clean diff
-./gradlew -Ppublishing -PjavaLinkageArtifactIds=$ARTIFACT 
:checkJavaLinkage |grep -v 'BUILD SUCCESSFUL in' | grep -v 'dependency paths' > 
${OUTPUT_DIR}/${COMMIT}-${ARTIFACT}
-echo "`date`:" "Done: ${OUTPUT_DIR}/${COMMIT}-${ARTIFACT}"
+echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT} in 
${BRANCH}"
+
+BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml
+if [ "$MODE" = "baseline" ]; then
+  BASELINE_OPTION='-PjavaLinkageWriteBaseline'
+  echo "`date`:" "to create a baseline (existing errors before change) 
$BASELINE_FILE"
+elif [ "$MODE" = "validate" ]; then
+  BASELINE_OPTION='-PjavaLinkageReadBaseline'
+  echo "`date`:" "using baseline $BASELINE_FILE"
+else
+  echo "invalid parameter for runLinkageCheck: ${MODE}"

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434648)
Time Spent: 0.5h  (was: 20m)

> Linkage Checker to use exclusion file as baseline
> -
>
> Key: BEAM-9958
> URL: https://issues.apache.org/jira/browse/BEAM-9958
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Linkage Checker to use exclusion file as baseline.
> Linkage Checker 1.4.0 has function to take exclusion file to filter out 
> existing linkage errors. This functionality eliminates the need of running 
> diff command in beam-linkage-check.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434647
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 22:30
Start Date: 18/May/20 22:30
Worklog Time Spent: 10m 
  Work Description: nielm commented on pull request #11528:
URL: https://github.com/apache/beam/pull/11528#issuecomment-630469413


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434647)
Time Spent: 1h  (was: 50m)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9821) SpannerIO does not include all batching parameters in DisplayData.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9821?focusedWorklogId=434645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434645
 ]

ASF GitHub Bot logged work on BEAM-9821:


Author: ASF GitHub Bot
Created on: 18/May/20 22:29
Start Date: 18/May/20 22:29
Worklog Time Spent: 10m 
  Work Description: nielm commented on a change in pull request #11528:
URL: https://github.com/apache/beam/pull/11528#discussion_r426926639



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
##
@@ -991,6 +1001,24 @@ public WriteGrouped(Write spec) {
   this.spec = spec;
 }
 
+@Override
+public void populateDisplayData(DisplayData.Builder builder) {
+  super.populateDisplayData(builder);
+  spec.getSpannerConfig().populateDisplayData(builder);
+  builder.add(
+  DisplayData.item("batchSizeBytes", spec.getBatchSizeBytes())
+  .withLabel("Max batch size in sytes"));
+  builder.add(
+  DisplayData.item("maxNumMutations", spec.getMaxNumMutations())
+  .withLabel("Max number of mutated cells in each batch"));
+  builder.add(
+  DisplayData.item("maxNumRows", spec.getMaxNumRows())
+  .withLabel("Max number of rows in each batch"));
+  builder.add(
+  DisplayData.item("groupingFactor", spec.getGroupingFactor())
+  .withLabel("Number of batches to sort over"));
+}
+

Review comment:
   LGTM. 
   I extracted a single method in Write to populate the displayData with 
parameters to avoid repeating code.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434645)
Time Spent: 50m  (was: 40m)

> SpannerIO does not include all batching parameters in DisplayData.
> --
>
> Key: BEAM-9821
> URL: https://issues.apache.org/jira/browse/BEAM-9821
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.20.0, 2.21.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: P3
>  Labels: google-cloud-spanner
> Fix For: 2.22.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> SpannerIO Write and WriteGrouped do not populate all of the batching/grouping 
> parameters in their DisplayData – they only show "batchSizeBytes"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434646
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 18/May/20 22:29
Start Date: 18/May/20 22:29
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r426926720



##
File path: 
runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
##
@@ -195,7 +187,7 @@ public void testFileWithExtensionPackageNamingAndSize() 
throws Exception {
 PackageAttributes attr = makePackageAttributes(tmpFile, null);
 DataflowPackage target = attr.getDestination();
 
-assertThat(target.getName(), RegexMatcher.matches("file-" + HASH_PATTERN + 
".txt"));
+assertThat(target.getName(), RegexMatcher.matches(UUID_PATTERN + ".txt"));

Review comment:
   Is it important, for the purposes of this test (or Dataflow in general) 
that the staged file name is in the form of a UUID? If not (and I don't think 
it is) it's better not to test for it. (A test that might be good to add is to 
see if two same-named files in different directories actually get staged to 
different places, which is the underlying, important intent.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434646)
Time Spent: 9h 50m  (was: 9h 40m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10029) Add Spanner IO Performance Tests for Python SDK

2020-05-18 Thread Shehzaad Nakhoda (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-10029:

Summary: Add Spanner IO Performance Tests for Python SDK  (was: Add Spanner 
IO Performance Test for Python)

> Add Spanner IO Performance Tests for Python SDK
> ---
>
> Key: BEAM-10029
> URL: https://issues.apache.org/jira/browse/BEAM-10029
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shehzaad Nakhoda
>Assignee: Shoaib Zafar
>Priority: P2
>
> Add performance tests so that the SpannerIO functionality can move into 
> production (i.e. out of experimental).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10029) Add Spanner IO Performance Tests for Python SDK

2020-05-18 Thread Shehzaad Nakhoda (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110660#comment-17110660
 ] 

Shehzaad Nakhoda commented on BEAM-10029:
-

cc: [~chamikara] [~altay]

> Add Spanner IO Performance Tests for Python SDK
> ---
>
> Key: BEAM-10029
> URL: https://issues.apache.org/jira/browse/BEAM-10029
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shehzaad Nakhoda
>Assignee: Shoaib Zafar
>Priority: P2
>
> Add performance tests so that the SpannerIO functionality can move into 
> production (i.e. out of experimental).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10029) Add Spanner IO Performance Test for Python

2020-05-18 Thread Shehzaad Nakhoda (Jira)

Shehzaad Nakhoda created BEAM-10029:
---

 Summary: Add Spanner IO Performance Test for Python
 Key: BEAM-10029
 URL: https://issues.apache.org/jira/browse/BEAM-10029
 Project: Beam
  Issue Type: Test
  Components: io-py-gcp
Reporter: Shehzaad Nakhoda
Assignee: Shoaib Zafar


Spanner IO (Python SDK) contains PTransform which uses the BatchAPI to read 
from the spanner. Currently, it only contains direct runner unit tests. In 
order to make this functionality available for the users, integration tests 
also need to be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10029) Add Spanner IO Performance Test for Python

2020-05-18 Thread Shehzaad Nakhoda (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda updated BEAM-10029:

Description: Add performance tests so that the SpannerIO functionality can 
move into production (i.e. out of experimental).  (was: Spanner IO (Python SDK) 
contains PTransform which uses the BatchAPI to read from the spanner. 
Currently, it only contains direct runner unit tests. In order to make this 
functionality available for the users, integration tests also need to be added.)

> Add Spanner IO Performance Test for Python
> --
>
> Key: BEAM-10029
> URL: https://issues.apache.org/jira/browse/BEAM-10029
> Project: Beam
>  Issue Type: Test
>  Components: io-py-gcp
>Reporter: Shehzaad Nakhoda
>Assignee: Shoaib Zafar
>Priority: P2
>
> Add performance tests so that the SpannerIO functionality can move into 
> production (i.e. out of experimental).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434644
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 18/May/20 22:24
Start Date: 18/May/20 22:24
Worklog Time Spent: 10m 
  Work Description: ihji commented on a change in pull request #11039:
URL: https://github.com/apache/beam/pull/11039#discussion_r426924859



##
File path: 
runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/util/PackageUtilTest.java
##
@@ -195,7 +187,7 @@ public void testFileWithExtensionPackageNamingAndSize() 
throws Exception {
 PackageAttributes attr = makePackageAttributes(tmpFile, null);
 DataflowPackage target = attr.getDestination();
 
-assertThat(target.getName(), RegexMatcher.matches("file-" + HASH_PATTERN + 
".txt"));
+assertThat(target.getName(), RegexMatcher.matches(UUID_PATTERN + ".txt"));

Review comment:
   but it only checks whether the staged file name has the same extension 
(vs. checks whether the staged file name is in the form of UUID with the same 
extension)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434644)
Time Spent: 9h 40m  (was: 9.5h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434642
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 18/May/20 22:18
Start Date: 18/May/20 22:18
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-630465362


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434642)
Time Spent: 9h 20m  (was: 9h 10m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=434643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434643
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 18/May/20 22:18
Start Date: 18/May/20 22:18
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-630465449


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434643)
Time Spent: 9.5h  (was: 9h 20m)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P0
> Fix For: 2.22.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=434641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434641
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 18/May/20 22:13
Start Date: 18/May/20 22:13
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11740:
URL: https://github.com/apache/beam/pull/11740#issuecomment-630462316


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434641)
Time Spent: 19h  (was: 18h 50m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P1
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=434640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434640
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 18/May/20 22:08
Start Date: 18/May/20 22:08
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #11747:
URL: https://github.com/apache/beam/pull/11747#issuecomment-630460354


   R: @lostluck 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434640)
Time Spent: 2h 40m  (was: 2.5h)

> Create Go SDK synthetic sources.
> 
>
> Key: BEAM-9951
> URL: https://issues.apache.org/jira/browse/BEAM-9951
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: P2
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Create synthetic sources for the Go SDK like 
> [Java|https://github.com/apache/beam/tree/master/sdks/java/io/synthetic/src/main/java/org/apache/beam/sdk/io/synthetic]
>  and 
> [Python|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/synthetic_pipeline.py]
>  have.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=434639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434639
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 18/May/20 22:08
Start Date: 18/May/20 22:08
Worklog Time Spent: 10m 
  Work Description: youngoli opened a new pull request #11747:
URL: https://github.com/apache/beam/pull/11747


   Instead of just creating SourceConfigs and StepConfigs, have a builder
   pattern to allow more user-friendly creation of those configs with
   defaults.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?focusedWorklogId=434638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434638
 ]

ASF GitHub Bot logged work on BEAM-9984:


Author: ASF GitHub Bot
Created on: 18/May/20 22:06
Start Date: 18/May/20 22:06
Worklog Time Spent: 10m 
  Work Description: omarismail94 commented on a change in pull request 
#11737:
URL: https://github.com/apache/beam/pull/11737#discussion_r426918300



##
File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
##
@@ -4500,6 +4499,23 @@ public void testIsNullTrueFalse() {
 
pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
   }
 
+  @Test
+  public void testZetaSQLBitOr() {

Review comment:
   Zeta was surprisingly easier than Calcite!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434638)
Time Spent: 1h 50m  (was: 1h 40m)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Omar Ismail
>Priority: P2
>  Labels: starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?focusedWorklogId=434637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434637
 ]

ASF GitHub Bot logged work on BEAM-9984:


Author: ASF GitHub Bot
Created on: 18/May/20 22:05
Start Date: 18/May/20 22:05
Worklog Time Spent: 10m 
  Work Description: omarismail94 commented on a change in pull request 
#11737:
URL: https://github.com/apache/beam/pull/11737#discussion_r426918114



##
File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
##
@@ -2836,7 +2836,6 @@ public void testDistinctOnNull() {
   }
 
   @Test
-  @Ignore("BeamSQL does not support ANY_VALUE")

Review comment:
   Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434637)
Time Spent: 1h 40m  (was: 1.5h)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Omar Ismail
>Priority: P2
>  Labels: starter
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?focusedWorklogId=434636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434636
 ]

ASF GitHub Bot logged work on BEAM-9984:


Author: ASF GitHub Bot
Created on: 18/May/20 22:05
Start Date: 18/May/20 22:05
Worklog Time Spent: 10m 
  Work Description: omarismail94 commented on a change in pull request 
#11737:
URL: https://github.com/apache/beam/pull/11737#discussion_r426918047



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamBuiltinAggregations.java
##
@@ -171,10 +173,18 @@ static CombineFn createAvg(Schema.FieldType fieldType) {
 return new BigDecimalAvg();
   default:
 throw new UnsupportedOperationException(
-String.format("[%s] is not support in AVG", fieldType));
+String.format("[%s] is not supported in AVG", fieldType));
 }
   }
 
+  static CombineFn createBitOr(Schema.FieldType fieldType) {
+if (fieldType.getTypeName() == TypeName.INT64) {
+  return new BitOr();
+}
+throw new UnsupportedOperationException(
+String.format("[%s] is not supported in BIT_OR", fieldType));

Review comment:
   I saw the other functions in this class do something similar, so I 
thought I'd do so as well





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434636)
Time Spent: 1.5h  (was: 1h 20m)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Omar Ismail
>Priority: P2
>  Labels: starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=434632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434632
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 18/May/20 21:56
Start Date: 18/May/20 21:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-630455763


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434632)
Time Spent: 44h 50m  (was: 44h 40m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P3
>  Time Spent: 44h 50m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=434618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434618
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 18/May/20 21:30
Start Date: 18/May/20 21:30
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11740:
URL: https://github.com/apache/beam/pull/11740#issuecomment-630445694


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434618)
Time Spent: 18h 40m  (was: 18.5h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P1
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=434619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434619
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 18/May/20 21:30
Start Date: 18/May/20 21:30
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11740:
URL: https://github.com/apache/beam/pull/11740#issuecomment-630445775


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434619)
Time Spent: 18h 50m  (was: 18h 40m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P1
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-10028) [Java SDK] Support state backed iterables within the SDK harness

2020-05-18 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10028?focusedWorklogId=434616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434616
 ]

ASF GitHub Bot logged work on BEAM-10028:
-

Author: ASF GitHub Bot
Created on: 18/May/20 21:27
Start Date: 18/May/20 21:27
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11746:
URL: https://github.com/apache/beam/pull/11746


   This required supporting a translation context through CoderTranslator to 
give access to the BeamFnStateClient and current process bundle instruction id.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build

[jira] [Created] (BEAM-10028) [Java SDK] Support state backed iterables within the SDK harness

2020-05-18 Thread Luke Cwik (Jira)

Luke Cwik created BEAM-10028:


 Summary: [Java SDK] Support state backed iterables within the SDK 
harness
 Key: BEAM-10028
 URL: https://issues.apache.org/jira/browse/BEAM-10028
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-10028) [Java SDK] Support state backed iterables within the SDK harness

2020-05-18 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10028:
---
Status: Open  (was: Triage Needed)

> [Java SDK] Support state backed iterables within the SDK harness
> 
>
> Key: BEAM-10028
> URL: https://issues.apache.org/jira/browse/BEAM-10028
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 254 matches

Mail list logo