date:20200515

[jira] [Commented] (BEAM-9961) Python MongoDBIO does not apply projection

2020-05-15 Thread Corvin Deboeser (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108875#comment-17108875
 ] 

Corvin Deboeser commented on BEAM-9961:
---

Hi [~kenn] sure thing! Could also take care of #9960, #10002 and #10004 in one 
go. All quite small fixes.

> Python MongoDBIO does not apply projection
> --
>
> Key: BEAM-9961
> URL: https://issues.apache.org/jira/browse/BEAM-9961
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Priority: Major
>
> ReadFromMongoDB does not apply the provided projection when reading from the 
> client - only filter is being applied as you can see here:
> https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-10002) Mongo cursor timeout leads to CursorNotFound error

2020-05-15 Thread Corvin Deboeser (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108874#comment-17108874
 ] 

Corvin Deboeser commented on BEAM-10002:


Hi [~yichi] sure thing, will have time to fix it next week.

> Mongo cursor timeout leads to CursorNotFound error
> --
>
> Key: BEAM-10002
> URL: https://issues.apache.org/jira/browse/BEAM-10002
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Priority: Major
>
> If some work items take a lot of processing time and the cursor of a bundle 
> is not queried for too long, then mongodb will timeout the cursor which 
> results in
> {code:java}
> pymongo.errors.CursorNotFound: cursor id ... not found
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9819) Extend acceptable httplib2 version range.

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9819?focusedWorklogId=434004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434004
 ]

ASF GitHub Bot logged work on BEAM-9819:


Author: ASF GitHub Bot
Created on: 16/May/20 05:20
Start Date: 16/May/20 05:20
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #11726:
URL: https://github.com/apache/beam/pull/11726#issuecomment-629590663


   R: @chamikaramj 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 434004)
Time Spent: 1h 50m  (was: 1h 40m)

> Extend acceptable httplib2 version range.
> -
>
> Key: BEAM-9819
> URL: https://issues.apache.org/jira/browse/BEAM-9819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> A Beam Python 3 user reported an inconvenience when migrating their Python 2 
> to Python 3 pipeline due to a bug in httlib2 dependency, where httplib2 on 
> Python3 doesn’t respect NO_PROXY environment variable. This bug was fixed in 
> 0.13.1 [1]. Looking at the changelog of httplib2[2], there were more 
> Python3-specific fixes in recent versions.
> In the past we restricted httplib2 version due to a conflict with 
> googledatastore[3]. We have since then removed[4] a dependency on 
> googledatastore, and I don't see other reasons to restrict httplib2 to 
> 0.12.0. 
> [1] https://github.com/httplib2/httplib2/pull/140
> [2] https://github.com/httplib2/httplib2/blob/master/CHANGELOG
> [2] 
> https://github.com/apache/beam/commit/3b2c90156ddb67f4daddf275172c0b2d4eb1eaf6
> [3] 
> https://github.com/apache/beam/pull/11175/files#diff-e9d0ab71f74dc10309a29b697ee99330L202



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9896) Add streaming for SnowflakeIO.Write to Java SDK

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9896:
--
Status: Open  (was: Triage Needed)

> Add streaming for SnowflakeIO.Write to Java SDK
> ---
>
> Key: BEAM-9896
> URL: https://issues.apache.org/jira/browse/BEAM-9896
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9897) Add cross-language support to SnowflakeIO.Read

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9897:
--
Status: Open  (was: Triage Needed)

> Add cross-language support to SnowflakeIO.Read
> --
>
> Key: BEAM-9897
> URL: https://issues.apache.org/jira/browse/BEAM-9897
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9894) Add batch SnowflakeIO.Write to Java SDK

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9894:
--
Status: Open  (was: Triage Needed)

> Add batch SnowflakeIO.Write to Java SDK
> ---
>
> Key: BEAM-9894
> URL: https://issues.apache.org/jira/browse/BEAM-9894
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9898) Add cross-language support to SnowflakeIO.Write

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9898:
--
Status: Open  (was: Triage Needed)

> Add cross-language support to SnowflakeIO.Write
> ---
>
> Key: BEAM-9898
> URL: https://issues.apache.org/jira/browse/BEAM-9898
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Dariusz Aniszewski
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9992) Migrate BeamSQL's SET operators to Sets transforms

2020-05-15 Thread Darshan Jani (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani updated BEAM-9992:
---
Summary: Migrate BeamSQL's SET operators to Sets transforms  (was: Migrate 
BeamSQL's SET operators to SetFns transforms)

> Migrate BeamSQL's SET operators to Sets transforms
> --
>
> Key: BEAM-9992
> URL: https://issues.apache.org/jira/browse/BEAM-9992
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> As par of [BEAM-9946|https://issues.apache.org/jira/browse/BEAM-9946] we have 
> new Sets transforms for intersect,union and except.
> This jira is to use them to remove existing Set operators in BeamSQL code.
> Tasks:
> # Remove: 
> [BeamSetOperatorRelBase.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java]
> # use SetFns transforms from
> ## 
> [BeamIntersectRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java]
> ## 
> [BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
> ## 
> [BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
> ## 
> [BeamUnionRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
> # 
> Remove:[BeamSetOperatorsTransforms.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9992) Migrate BeamSQL's SET operators to SetFns transforms

2020-05-15 Thread Darshan Jani (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani updated BEAM-9992:
---
Description: 
As par of [BEAM-9946|https://issues.apache.org/jira/browse/BEAM-9946] we have 
new Sets transforms for intersect,union and except.
This jira is to use them to remove existing Set operators in BeamSQL code.

Tasks:
# Remove: 
[BeamSetOperatorRelBase.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java]
# use SetFns transforms from
## 
[BeamIntersectRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java]
## 
[BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
## 
[BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
## 
[BeamUnionRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
# 
Remove:[BeamSetOperatorsTransforms.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java]


  was:
As par of [BEAM-9946|https://issues.apache.org/jira/browse/BEAM-9946] we have 
new SetFns transforms for intersect,union and except.
This jira is to use them to remove existing Set operators in BeamSQL code.

Tasks:
# Remove: 
[BeamSetOperatorRelBase.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java]
# use SetFns transforms from
## 
[BeamIntersectRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java]
## 
[BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
## 
[BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
## 
[BeamUnionRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
# 
Remove:[BeamSetOperatorsTransforms.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java]



> Migrate BeamSQL's SET operators to SetFns transforms
> 
>
> Key: BEAM-9992
> URL: https://issues.apache.org/jira/browse/BEAM-9992
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> As par of [BEAM-9946|https://issues.apache.org/jira/browse/BEAM-9946] we have 
> new Sets transforms for intersect,union and except.
> This jira is to use them to remove existing Set operators in BeamSQL code.
> Tasks:
> # Remove: 
> [BeamSetOperatorRelBase.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java]
> # use SetFns transforms from
> ## 
> [BeamIntersectRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIntersectRel.java]
> ## 
> [BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
> ## 
> [BeamMinusRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamMinusRel.java]
> ## 
> [BeamUnionRel.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
> # 
> Remove:[BeamSetOperatorsTransforms.java|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=433988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433988
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 16/May/20 01:45
Start Date: 16/May/20 01:45
Worklog Time Spent: 10m 
  Work Description: youngoli merged pull request #11728:
URL: https://github.com/apache/beam/pull/11728


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433988)
Time Spent: 2h 20m  (was: 2h 10m)

> Create Go SDK synthetic sources.
> 
>
> Key: BEAM-9951
> URL: https://issues.apache.org/jira/browse/BEAM-9951
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Create synthetic sources for the Go SDK like 
> [Java|https://github.com/apache/beam/tree/master/sdks/java/io/synthetic/src/main/java/org/apache/beam/sdk/io/synthetic]
>  and 
> [Python|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/synthetic_pipeline.py]
>  have.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=433987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433987
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 16/May/20 01:37
Start Date: 16/May/20 01:37
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #11728:
URL: https://github.com/apache/beam/pull/11728#discussion_r426104694



##
File path: sdks/go/pkg/beam/io/synthetic/step.go
##
@@ -0,0 +1,191 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package synthetic
+
+import (
+   "github.com/apache/beam/sdks/go/pkg/beam"
+   "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange"
+   "math/rand"
+   "time"
+)
+
+// Step creates a synthetic step transform that receives KV<[]byte, []byte>
+// elements from other synthetic transforms, and outputs KV<[]byte, []byte>
+// elements based on its inputs.
+//
+// This function accepts a StepConfig to configure the behavior of the 
synthetic
+// step, including whether that step is implemented as a splittable or
+// non-splittable DoFn.
+//
+// StepConfigs are recommended to be created via the DefaultStepConfig and
+// modified before being passed to this method. Example:
+//
+//cfg := synthetic.DefaultStepConfig()
+//cfg.OutputPerInput = 1000
+//cfg.Splittable = true
+//cfg.InitialSplits = 2
+//step := synthetic.Step(s, cfg, input)
+func Step(s beam.Scope, cfg StepConfig, col beam.PCollection) beam.PCollection 
{
+   s = s.Scope("synthetic.Step")
+   if cfg.Splittable {
+   return beam.ParDo(s, {cfg: cfg}, col)
+   } else {
+   return beam.ParDo(s, {cfg: cfg}, col)
+   }
+}
+
+// stepFn is a DoFn implementing behavior for synthetic steps. For usage
+// information, see synthetic.Step.
+//
+// The stepFn is expected to be initialized with a cfg and will follow that
+// config to determine its behavior when emitting elements.
+type stepFn struct {
+   cfg StepConfig
+   rng randWrapper
+}
+
+// Setup sets up the random number generator.
+func (fn *stepFn) Setup() {
+   fn.rng = rand.New(rand.NewSource(time.Now().UnixNano()))
+}
+
+// ProcessElement takes an input and either filters it or produces a number of
+// outputs identical to that input based on the outputs per input configuration
+// in StepConfig.
+func (fn *stepFn) ProcessElement(key, val []byte, emit func([]byte, []byte)) {
+   if fn.cfg.FilterRatio > 0 && fn.rng.Float64() < fn.cfg.FilterRatio {
+   return
+   }
+   for i := 0; i < fn.cfg.OutputPerInput; i++ {
+   emit(key, val)
+   }
+}
+
+// sdfStepFn is a splittable DoFn implementing behavior for synthetic steps.
+// For usage information, see synthetic.Step.
+//
+// The sdfStepFn is expected to be initialized with a cfg and will follow
+// that config to determine its behavior when splitting and emitting elements.
+type sdfStepFn struct {
+   cfg StepConfig
+   rng randWrapper
+}
+
+// CreateInitialRestriction creates an offset range restriction representing
+// the number of elements to emit for this received element, as specified by
+// the output per input configuration in StepConfig.
+func (fn *sdfStepFn) CreateInitialRestriction(key, val []byte) 
offsetrange.Restriction {
+   return offsetrange.Restriction{
+   Start: 0,
+   End:   int64(fn.cfg.OutputPerInput),
+   }
+}
+
+// SplitRestriction splits restrictions equally according to the number of
+// initial splits specified in StepConfig. Each restriction output by this
+// method will contain at least one element, so the number of splits will not
+// exceed the number of elements.
+func (fn *sdfStepFn) SplitRestriction(key, val []byte, rest 
offsetrange.Restriction) (splits []offsetrange.Restriction) {
+   if fn.cfg.InitialSplits <= 1 {
+   // Don't split, just return original restriction.
+   splits = append(splits, rest)
+   return splits
+   }
+
+   // TODO(BEAM-9978) Move this implementation

[jira] [Work logged] (BEAM-9951) Create Go SDK synthetic sources.

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9951?focusedWorklogId=433985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433985
 ]

ASF GitHub Bot logged work on BEAM-9951:


Author: ASF GitHub Bot
Created on: 16/May/20 01:28
Start Date: 16/May/20 01:28
Worklog Time Spent: 10m 
  Work Description: youngoli commented on a change in pull request #11728:
URL: https://github.com/apache/beam/pull/11728#discussion_r426103916



##
File path: sdks/go/pkg/beam/io/synthetic/step.go
##
@@ -0,0 +1,191 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package synthetic
+
+import (
+   "github.com/apache/beam/sdks/go/pkg/beam"
+   "github.com/apache/beam/sdks/go/pkg/beam/io/rtrackers/offsetrange"
+   "math/rand"
+   "time"
+)
+
+// Step creates a synthetic step transform that receives KV<[]byte, []byte>
+// elements from other synthetic transforms, and outputs KV<[]byte, []byte>
+// elements based on its inputs.
+//
+// This function accepts a StepConfig to configure the behavior of the 
synthetic
+// step, including whether that step is implemented as a splittable or
+// non-splittable DoFn.
+//
+// StepConfigs are recommended to be created via the DefaultStepConfig and
+// modified before being passed to this method. Example:
+//
+//cfg := synthetic.DefaultStepConfig()
+//cfg.OutputPerInput = 1000
+//cfg.Splittable = true
+//cfg.InitialSplits = 2
+//step := synthetic.Step(s, cfg, input)
+func Step(s beam.Scope, cfg StepConfig, col beam.PCollection) beam.PCollection 
{
+   s = s.Scope("synthetic.Step")
+   if cfg.Splittable {
+   return beam.ParDo(s, {cfg: cfg}, col)
+   } else {
+   return beam.ParDo(s, {cfg: cfg}, col)
+   }
+}
+
+// stepFn is a DoFn implementing behavior for synthetic steps. For usage
+// information, see synthetic.Step.
+//
+// The stepFn is expected to be initialized with a cfg and will follow that
+// config to determine its behavior when emitting elements.
+type stepFn struct {
+   cfg StepConfig
+   rng randWrapper
+}
+
+// Setup sets up the random number generator.
+func (fn *stepFn) Setup() {
+   fn.rng = rand.New(rand.NewSource(time.Now().UnixNano()))
+}
+
+// ProcessElement takes an input and either filters it or produces a number of
+// outputs identical to that input based on the outputs per input configuration
+// in StepConfig.
+func (fn *stepFn) ProcessElement(key, val []byte, emit func([]byte, []byte)) {
+   if fn.cfg.FilterRatio > 0 && fn.rng.Float64() < fn.cfg.FilterRatio {
+   return
+   }
+   for i := 0; i < fn.cfg.OutputPerInput; i++ {
+   emit(key, val)
+   }
+}
+
+// sdfStepFn is a splittable DoFn implementing behavior for synthetic steps.
+// For usage information, see synthetic.Step.
+//
+// The sdfStepFn is expected to be initialized with a cfg and will follow
+// that config to determine its behavior when splitting and emitting elements.
+type sdfStepFn struct {
+   cfg StepConfig
+   rng randWrapper
+}
+
+// CreateInitialRestriction creates an offset range restriction representing
+// the number of elements to emit for this received element, as specified by
+// the output per input configuration in StepConfig.
+func (fn *sdfStepFn) CreateInitialRestriction(key, val []byte) 
offsetrange.Restriction {
+   return offsetrange.Restriction{
+   Start: 0,
+   End:   int64(fn.cfg.OutputPerInput),
+   }
+}
+
+// SplitRestriction splits restrictions equally according to the number of
+// initial splits specified in StepConfig. Each restriction output by this
+// method will contain at least one element, so the number of splits will not
+// exceed the number of elements.
+func (fn *sdfStepFn) SplitRestriction(key, val []byte, rest 
offsetrange.Restriction) (splits []offsetrange.Restriction) {
+   if fn.cfg.InitialSplits <= 1 {
+   // Don't split, just return original restriction.
+   splits = append(splits, rest)
+   return splits

Review comment:
   Done.

[jira] [Commented] (BEAM-9991) Mention start/finishBundle in ParDo documentation

2020-05-15 Thread Rose Nguyen (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108791#comment-17108791
 ] 

Rose Nguyen commented on BEAM-9991:
---

I think the ParDo section of the programming guide would be more appropriate: 
[https://beam.apache.org/documentation/programming-guide/#pardo]

Adding it to the Java Transform Catalog would work if we could also include a 
new Colab code sample.

> Mention start/finishBundle in ParDo documentation
> -
>
> Key: BEAM-9991
> URL: https://issues.apache.org/jira/browse/BEAM-9991
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: Brent Worden
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8323) Testing Guideline using deprecated DoFnTester

2020-05-15 Thread Brent Worden (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brent Worden closed BEAM-8323.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

> Testing Guideline using deprecated DoFnTester
> -
>
> Key: BEAM-8323
> URL: https://issues.apache.org/jira/browse/BEAM-8323
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Reza ardeshir rokni
>Priority: Major
> Fix For: Not applicable
>
>
> [https://beam.apache.org/documentation/pipelines/test-your-pipeline/]
> Uses deprecated DoFnTester 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8604) Remove deprecated DoFnTester docs

2020-05-15 Thread Brent Worden (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brent Worden closed BEAM-8604.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

> Remove deprecated DoFnTester docs
> -
>
> Key: BEAM-8604
> URL: https://issues.apache.org/jira/browse/BEAM-8604
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Rose Nguyen
>Priority: Major
> Fix For: Not applicable
>
>
> Remove references to deprecated DoFnTester
> [https://beam.apache.org/documentation/pipelines/test-your-pipeline/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-9817) beam_PreCommit_Website_Stage_GCS_Commit failed

2020-05-15 Thread Brent Worden (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brent Worden closed BEAM-9817.
--
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

> beam_PreCommit_Website_Stage_GCS_Commit failed
> --
>
> Key: BEAM-9817
> URL: https://issues.apache.org/jira/browse/BEAM-9817
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Kyle Weaver
>Priority: Minor
> Fix For: Not applicable
>
>
> Task :website:buildGcsWebsite FAILED
> Liquid Exception: 500 Internal Server Error in 
> documentation/transforms/python/elementwise/flatmap.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-15 Thread Kyle Weaver (Jira)

Kyle Weaver created BEAM-10016:
--

 Summary: Flink postcommits failing 
testFlattenWithDifferentInputAndOutputCoders2
 Key: BEAM-10016
 URL: https://issues.apache.org/jira/browse/BEAM-10016
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Kyle Weaver


Both beam_PostCommit_Java_PVR_Flink_Batch and 
beam_PostCommit_Java_PVR_Flink_Streaming are failing 
org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.

java.lang.RuntimeException: The Runner experienced the following error during 
execution:
java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be cast to [B
at 
org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
at 
org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
at 
org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
at 
org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2(FlattenTest.java:397)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.GeneratedMethodAccessor111.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at

[jira] [Updated] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-15 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-10016:
---
Status: Open  (was: Triage Needed)

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be cast to 
> [B
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2(FlattenTest.java:397)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at

[jira] [Assigned] (BEAM-10016) Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2

2020-05-15 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver reassigned BEAM-10016:
--

Assignee: Kyle Weaver

> Flink postcommits failing testFlattenWithDifferentInputAndOutputCoders2
> ---
>
> Key: BEAM-10016
> URL: https://issues.apache.org/jira/browse/BEAM-10016
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Both beam_PostCommit_Java_PVR_Flink_Batch and 
> beam_PostCommit_Java_PVR_Flink_Streaming are failing 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2.
> java.lang.RuntimeException: The Runner experienced the following error during 
> execution:
> java.lang.ClassCastException: org.apache.beam.sdk.values.KV cannot be cast to 
> [B
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:165)
>   at 
> org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:110)
>   at 
> org.apache.beam.runners.portability.testing.TestPortableRunner.run(TestPortableRunner.java:83)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>   at 
> org.apache.beam.sdk.transforms.FlattenTest.testFlattenWithDifferentInputAndOutputCoders2(FlattenTest.java:397)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:266)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at

[jira] [Updated] (BEAM-8735) Beam site says to install tox, but GitHub README does not

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8735:
--
Summary: Beam site says to install tox, but GitHub README does not  (was: 
tox: yes or no)

> Beam site says to install tox, but GitHub README does not
> -
>
> Key: BEAM-8735
> URL: https://issues.apache.org/jira/browse/BEAM-8735
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> https://beam.apache.org/contribute/ says "Python, virtualenv, and tox 
> installed for Python SDK development"
> However, https://github.com/apache/beam says
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> Notice the second makes no mention of tox. Please sync up these instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8735) Beam site says to install tox, but GitHub README does not

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108775#comment-17108775
 ] 

Kenneth Knowles commented on BEAM-8735:
---

Ideally we'd just have one set of instructions.

> Beam site says to install tox, but GitHub README does not
> -
>
> Key: BEAM-8735
> URL: https://issues.apache.org/jira/browse/BEAM-8735
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> https://beam.apache.org/contribute/ says "Python, virtualenv, and tox 
> installed for Python SDK development"
> However, https://github.com/apache/beam says
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> Notice the second makes no mention of tox. Please sync up these instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8735) tox: yes or no

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8735:
--
Status: Open  (was: Triage Needed)

> tox: yes or no
> --
>
> Key: BEAM-8735
> URL: https://issues.apache.org/jira/browse/BEAM-8735
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> https://beam.apache.org/contribute/ says "Python, virtualenv, and tox 
> installed for Python SDK development"
> However, https://github.com/apache/beam says
> If you'd like to build and install the whole project from the source 
> distribution, you may need some additional tools installed in your system. In 
> a Debian-based distribution:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> Notice the second makes no mention of tox. Please sync up these instructions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9355) Python typehints: support NewType

2020-05-15 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108770#comment-17108770
 ] 

Udi Meiri commented on BEAM-9355:
-

Yes, it needs an actual implementation. Current support just converts to Any.

> Python typehints: support NewType
> -
>
> Key: BEAM-9355
> URL: https://issues.apache.org/jira/browse/BEAM-9355
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://docs.python.org/3/library/typing.html#newtype



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=433971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433971
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 15/May/20 23:48
Start Date: 15/May/20 23:48
Worklog Time Spent: 10m 
  Work Description: ibzib merged pull request #11717:
URL: https://github.com/apache/beam/pull/11717


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433971)
Time Spent: 31h 20m  (was: 31h 10m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 31h 20m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9999) Remove support for EOLed runners (Apex, etc.)

2020-05-15 Thread Ahmet Altay (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108766#comment-17108766
 ] 

Ahmet Altay commented on BEAM-:
---

Thank you for the quick comments. Do you think it will be worth bringing this 
to user or dev list? Or this is sufficient information to remove support. 

One additional benefit would be, some of the ignored tests could also be 
removed.

> Remove support for EOLed runners (Apex, etc.)
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex, runner-core
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>Priority: Major
>
> These runners look EOLed, not maintained:
> - Apex (last release 2+ years ago)
> - Gearpump (last release 1+ year ago)
> Removing support for these could reduce the code base size, reduce flaky 
> test, and make it easier to add new features.
> /cc [~kenn][~tysonjh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-8415:
-

Assignee: David Yan

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: David Yan
>Assignee: David Yan
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-15 Thread Brian Hulette (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108765#comment-17108765
 ] 

Brian Hulette commented on BEAM-9361:
-

Rueven did add an 
[EnumerationType|https://github.com/apache/beam/blob/d7df9ed14bca07d341bb689053e82674bf0e0243/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/EnumerationType.java]
 which would be appropriate here, but I'm not sure what will happen if you try 
to use it in SQL.

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: Major
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.java:261)
>

[jira] [Commented] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108764#comment-17108764
 ] 

Kenneth Knowles commented on BEAM-8415:
---

I see you modified this for the Python SDK. So this may be resolved? But I 
think the error message is very similar in Java. Did you check?

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8415:
--
Component/s: sdk-java-core

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8415:
--
Status: Open  (was: Triage Needed)

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8429) Define API for getting non user metrics.

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8429:
--
Status: Open  (was: Triage Needed)

> Define API for getting non user metrics.
> 
>
> Key: BEAM-8429
> URL: https://issues.apache.org/jira/browse/BEAM-8429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core, sdk-py-core
>Reporter: Ahmet Altay
>Priority: Major
>
> MetricResults API does not support a way to query system metrics. Different 
> implementations accomplish the same thing in different ways.
> FnApiRunner has a monitoring_metrics() defined on its PipelineResult 
> (RunnerResult) object.
> DataflowRunner, has a all_metrics() method defined on its MetricResults 
> (DataflowMetrics) object.
> Dataflow implementation of using metrics().all_metrics() sound reasonable. It 
> would be usefult to define a single API that all runners implementations 
> could agree on.
> cc: [~pabloem] [~robertwb] [~Ardagan]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8423) Japanese characters encoding issue

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108763#comment-17108763
 ] 

Kenneth Knowles commented on BEAM-8423:
---

You may need to use the new {{JvmInitializers}} feature to set the character 
encoding to UTF-8.

> Japanese characters encoding issue 
> ---
>
> Key: BEAM-8423
> URL: https://issues.apache.org/jira/browse/BEAM-8423
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.15.0
> Environment: dataflow
>Reporter: Jyoti Aditya
>Priority: Major
>
> I am running apache beam job to parse japanese html pages. While running the 
> job, I see in stackdriver log it is showing japanese character properly. But 
> same data written to GCS bucket has encoding issue and it is getting 
> corrupted.
>  
> {noformat}
> //code
> Pipeline pipeline = Pipeline.create(options);
> CoderRegistry cr = pipeline.getCoderRegistry();
> cr.registerCoderForClass(String.class, StringUtf8Coder.of());
> cr.registerCoderForClass(Integer.class, BigEndianIntegerCoder.of());
> batchTuple = pipeline
>   .apply("Read from input files", 
> TextIO.read().from(options.getloadingBucketURL()).withCompression(Compression.GZIP)).setCoder(StringUtf8Coder.of())
>   .apply("Process input files",ParDo.of(new 
> ExtractDataFromHtmlPage(extractionConfig,beamConfig.getLoadingBucketURL())).withOutputTags(successRecord,
>  TupleTagList.of(errorRecord).and(deadLetterRecords)));{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8423) Japanese characters encoding issue

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8423:
--
Status: Open  (was: Triage Needed)

> Japanese characters encoding issue 
> ---
>
> Key: BEAM-8423
> URL: https://issues.apache.org/jira/browse/BEAM-8423
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 2.15.0
> Environment: dataflow
>Reporter: Jyoti Aditya
>Priority: Major
>
> I am running apache beam job to parse japanese html pages. While running the 
> job, I see in stackdriver log it is showing japanese character properly. But 
> same data written to GCS bucket has encoding issue and it is getting 
> corrupted.
>  
> {noformat}
> //code
> Pipeline pipeline = Pipeline.create(options);
> CoderRegistry cr = pipeline.getCoderRegistry();
> cr.registerCoderForClass(String.class, StringUtf8Coder.of());
> cr.registerCoderForClass(Integer.class, BigEndianIntegerCoder.of());
> batchTuple = pipeline
>   .apply("Read from input files", 
> TextIO.read().from(options.getloadingBucketURL()).withCompression(Compression.GZIP)).setCoder(StringUtf8Coder.of())
>   .apply("Process input files",ParDo.of(new 
> ExtractDataFromHtmlPage(extractionConfig,beamConfig.getLoadingBucketURL())).withOutputTags(successRecord,
>  TupleTagList.of(errorRecord).and(deadLetterRecords)));{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8423) Japanese characters encoding issue

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8423:
--
Component/s: (was: beam-model)
 sdk-java-core

> Japanese characters encoding issue 
> ---
>
> Key: BEAM-8423
> URL: https://issues.apache.org/jira/browse/BEAM-8423
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
> Environment: dataflow
>Reporter: Jyoti Aditya
>Priority: Major
>
> I am running apache beam job to parse japanese html pages. While running the 
> job, I see in stackdriver log it is showing japanese character properly. But 
> same data written to GCS bucket has encoding issue and it is getting 
> corrupted.
>  
> {noformat}
> //code
> Pipeline pipeline = Pipeline.create(options);
> CoderRegistry cr = pipeline.getCoderRegistry();
> cr.registerCoderForClass(String.class, StringUtf8Coder.of());
> cr.registerCoderForClass(Integer.class, BigEndianIntegerCoder.of());
> batchTuple = pipeline
>   .apply("Read from input files", 
> TextIO.read().from(options.getloadingBucketURL()).withCompression(Compression.GZIP)).setCoder(StringUtf8Coder.of())
>   .apply("Process input files",ParDo.of(new 
> ExtractDataFromHtmlPage(extractionConfig,beamConfig.getLoadingBucketURL())).withOutputTags(successRecord,
>  TupleTagList.of(errorRecord).and(deadLetterRecords)));{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8423) Japanese characters encoding issue

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8423:
--
Component/s: (was: sdk-java-core)
 runner-dataflow

> Japanese characters encoding issue 
> ---
>
> Key: BEAM-8423
> URL: https://issues.apache.org/jira/browse/BEAM-8423
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.15.0
> Environment: dataflow
>Reporter: Jyoti Aditya
>Priority: Major
>
> I am running apache beam job to parse japanese html pages. While running the 
> job, I see in stackdriver log it is showing japanese character properly. But 
> same data written to GCS bucket has encoding issue and it is getting 
> corrupted.
>  
> {noformat}
> //code
> Pipeline pipeline = Pipeline.create(options);
> CoderRegistry cr = pipeline.getCoderRegistry();
> cr.registerCoderForClass(String.class, StringUtf8Coder.of());
> cr.registerCoderForClass(Integer.class, BigEndianIntegerCoder.of());
> batchTuple = pipeline
>   .apply("Read from input files", 
> TextIO.read().from(options.getloadingBucketURL()).withCompression(Compression.GZIP)).setCoder(StringUtf8Coder.of())
>   .apply("Process input files",ParDo.of(new 
> ExtractDataFromHtmlPage(extractionConfig,beamConfig.getLoadingBucketURL())).withOutputTags(successRecord,
>  TupleTagList.of(errorRecord).and(deadLetterRecords)));{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8268) Strengthen retry strategy in GCS Java IO

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8268:
--
Priority: Major  (was: Minor)

> Strengthen retry strategy in GCS Java IO 
> -
>
> Key: BEAM-8268
> URL: https://issues.apache.org/jira/browse/BEAM-8268
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> I have seen a report where an intermittent issue in Java GCS IO have caused 
> error messages without sufficient details, see: BEAM-8173. In addition to 
> BEAM-8173 we should also check whether GCS IO attempts to retry on all 
> retriable status codes [1], and if not - add retries. 
> https://cloud.google.com/storage/docs/json_api/v1/status-codes#http-status-and-error-codes
> cc: [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=433970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433970
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 15/May/20 23:40
Start Date: 15/May/20 23:40
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11437:
URL: https://github.com/apache/beam/pull/11437#issuecomment-629549790


   Run Java_Examples_Dataflow PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433970)
Time Spent: 4h 10m  (was: 4h)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: Trivial
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8268) Strengthen retry strategy in GCS Java IO

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8268:
--
Status: Open  (was: Triage Needed)

> Strengthen retry strategy in GCS Java IO 
> -
>
> Key: BEAM-8268
> URL: https://issues.apache.org/jira/browse/BEAM-8268
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>
> I have seen a report where an intermittent issue in Java GCS IO have caused 
> error messages without sufficient details, see: BEAM-8173. In addition to 
> BEAM-8173 we should also check whether GCS IO attempts to retry on all 
> retriable status codes [1], and if not - add retries. 
> https://cloud.google.com/storage/docs/json_api/v1/status-codes#http-status-and-error-codes
> cc: [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=433968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433968
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 15/May/20 23:39
Start Date: 15/May/20 23:39
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11437:
URL: https://github.com/apache/beam/pull/11437#issuecomment-629549669


   Run Java_Examples_Dataflow PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433968)
Time Spent: 4h  (was: 3h 50m)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: Trivial
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8239) Docker options in --environment_config

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108759#comment-17108759
 ] 

Kenneth Knowles commented on BEAM-8239:
---

[~mxm] [~ibzib]

> Docker options in --environment_config
> --
>
> Key: BEAM-8239
> URL: https://issues.apache.org/jira/browse/BEAM-8239
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.15.0
>Reporter: Benjamin Tan
>Priority: Major
>
> {{I'm trying to mount a directory by providing additional arguments via 
> –environment_config in the PipelineOptions:}}
>  
> {{pipeline_options = 
> PipelineOptions([{color:#6a8759}"--runner=PortableRunner"{color}{color:#cc7832},{color}
>  {color:#6a8759}"--job_endpoint=localhost:8099"{color}{color:#cc7832},{color} 
> {color:#6a8759}"--environment_config=-v /tmp:/tmp 
> benjamintan-docker-apache.bintray.io/beam/python3:latest"{color}]{color:#cc7832},{color}
>  {color:#aa4926}pipeline_type_check{color}={color:#cc7832}True{color})}}
>  
> However,  the command fails with the following:
>  
>  
> {{RuntimeError: Pipeline 
> BeamApp-benjamintan-091616-839e633f_994659f0-7da9-412e-91e2-f32dd4f24b5c 
> failed in state FAILED: java.io.IOException: Received exit code 125 for 
> command 'docker run -d --mount 
> type=bind,src=/home/benjamintan/.config/gcloud,dst=/root/.config/gcloud 
> --network=host --env=DOCKER_MAC_CONTAINER=null -v /tmp:/tmp 
> benjamintan-docker-apache.bintray.io/beam/python3:latest --id=7-1 
> --logging_endpoint=localhost:41835 --artifact_endpoint=localhost:40063 
> --provision_endpoint=localhost:39827 --control_endpoint=localhost:45355'. 
> stderr: unknown flag: --idSee 'docker run --help'.}}
>  
> However, if I were to copy and paste the `docker run ...` command, the 
> command seems OK (no syntax errors)
>  
> This seems related to BEAM-5440. It isn't clear if there's a "right" way to 
> pass in additional Docker run arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8239) Docker options in --environment_config

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8239:
--
Priority: Major  (was: Minor)

> Docker options in --environment_config
> --
>
> Key: BEAM-8239
> URL: https://issues.apache.org/jira/browse/BEAM-8239
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.15.0
>Reporter: Benjamin Tan
>Priority: Major
>
> {{I'm trying to mount a directory by providing additional arguments via 
> –environment_config in the PipelineOptions:}}
>  
> {{pipeline_options = 
> PipelineOptions([{color:#6a8759}"--runner=PortableRunner"{color}{color:#cc7832},{color}
>  {color:#6a8759}"--job_endpoint=localhost:8099"{color}{color:#cc7832},{color} 
> {color:#6a8759}"--environment_config=-v /tmp:/tmp 
> benjamintan-docker-apache.bintray.io/beam/python3:latest"{color}]{color:#cc7832},{color}
>  {color:#aa4926}pipeline_type_check{color}={color:#cc7832}True{color})}}
>  
> However,  the command fails with the following:
>  
>  
> {{RuntimeError: Pipeline 
> BeamApp-benjamintan-091616-839e633f_994659f0-7da9-412e-91e2-f32dd4f24b5c 
> failed in state FAILED: java.io.IOException: Received exit code 125 for 
> command 'docker run -d --mount 
> type=bind,src=/home/benjamintan/.config/gcloud,dst=/root/.config/gcloud 
> --network=host --env=DOCKER_MAC_CONTAINER=null -v /tmp:/tmp 
> benjamintan-docker-apache.bintray.io/beam/python3:latest --id=7-1 
> --logging_endpoint=localhost:41835 --artifact_endpoint=localhost:40063 
> --provision_endpoint=localhost:39827 --control_endpoint=localhost:45355'. 
> stderr: unknown flag: --idSee 'docker run --help'.}}
>  
> However, if I were to copy and paste the `docker run ...` command, the 
> command seems OK (no syntax errors)
>  
> This seems related to BEAM-5440. It isn't clear if there's a "right" way to 
> pass in additional Docker run arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-3797) Add reference entry for Filter

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3797:
--
Issue Type: Improvement  (was: Task)

> Add reference entry for Filter
> --
>
> Key: BEAM-3797
> URL: https://issues.apache.org/jira/browse/BEAM-3797
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rafael Fernandez
>Priority: Major
>  Labels: easyfix, reference
>
> Add a reference entry for Filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8239) Docker options in --environment_config

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8239:
--
Status: Open  (was: Triage Needed)

> Docker options in --environment_config
> --
>
> Key: BEAM-8239
> URL: https://issues.apache.org/jira/browse/BEAM-8239
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.15.0
>Reporter: Benjamin Tan
>Priority: Minor
>
> {{I'm trying to mount a directory by providing additional arguments via 
> –environment_config in the PipelineOptions:}}
>  
> {{pipeline_options = 
> PipelineOptions([{color:#6a8759}"--runner=PortableRunner"{color}{color:#cc7832},{color}
>  {color:#6a8759}"--job_endpoint=localhost:8099"{color}{color:#cc7832},{color} 
> {color:#6a8759}"--environment_config=-v /tmp:/tmp 
> benjamintan-docker-apache.bintray.io/beam/python3:latest"{color}]{color:#cc7832},{color}
>  {color:#aa4926}pipeline_type_check{color}={color:#cc7832}True{color})}}
>  
> However,  the command fails with the following:
>  
>  
> {{RuntimeError: Pipeline 
> BeamApp-benjamintan-091616-839e633f_994659f0-7da9-412e-91e2-f32dd4f24b5c 
> failed in state FAILED: java.io.IOException: Received exit code 125 for 
> command 'docker run -d --mount 
> type=bind,src=/home/benjamintan/.config/gcloud,dst=/root/.config/gcloud 
> --network=host --env=DOCKER_MAC_CONTAINER=null -v /tmp:/tmp 
> benjamintan-docker-apache.bintray.io/beam/python3:latest --id=7-1 
> --logging_endpoint=localhost:41835 --artifact_endpoint=localhost:40063 
> --provision_endpoint=localhost:39827 --control_endpoint=localhost:45355'. 
> stderr: unknown flag: --idSee 'docker run --help'.}}
>  
> However, if I were to copy and paste the `docker run ...` command, the 
> command seems OK (no syntax errors)
>  
> This seems related to BEAM-5440. It isn't clear if there's a "right" way to 
> pass in additional Docker run arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-3797) Add reference entry for Filter

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3797:
--
Priority: Major  (was: Minor)

> Add reference entry for Filter
> --
>
> Key: BEAM-3797
> URL: https://issues.apache.org/jira/browse/BEAM-3797
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Rafael Fernandez
>Priority: Major
>  Labels: easyfix, reference
>
> Add a reference entry for Filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8192) Improve version definition

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108758#comment-17108758
 ] 

Kenneth Knowles commented on BEAM-8192:
---

The suffixes are controlled by the different language ecosystems. For 
Java/maven {{-SNAPSHOT}} has special meaning. For Python I don't think there is 
a clear convention. I think having centralized scripts is probably the best way 
to have a centralized version. The scripts can know how to set and get the 
current version and validate the repo.

> Improve version definition
> --
>
> Key: BEAM-8192
> URL: https://issues.apache.org/jira/browse/BEAM-8192
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Priority: Major
>
> Version info is defined at several different locations and suffixes are not 
> consistent.
> For example, from version, we should be able to tell developing version vs 
> released version.
> Ideally, version is defined at a single location, and can be read from gradle 
> and Py, Go, Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-3797) Add reference entry for Filter

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-3797:
--
Status: Open  (was: Triage Needed)

> Add reference entry for Filter
> --
>
> Key: BEAM-3797
> URL: https://issues.apache.org/jira/browse/BEAM-3797
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Rafael Fernandez
>Priority: Minor
>  Labels: easyfix, reference
>
> Add a reference entry for Filter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8192) Improve version definition

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8192:
--
Status: Open  (was: Triage Needed)

> Improve version definition
> --
>
> Key: BEAM-8192
> URL: https://issues.apache.org/jira/browse/BEAM-8192
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Priority: Major
>
> Version info is defined at several different locations and suffixes are not 
> consistent.
> For example, from version, we should be able to tell developing version vs 
> released version.
> Ideally, version is defined at a single location, and can be read from gradle 
> and Py, Go, Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8799) Python WordCountIT benchmarks broken

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108756#comment-17108756
 ] 

Kenneth Knowles commented on BEAM-8799:
---

This test now seems mostly healthy, though it could still be a little bit 
flakey.

> Python WordCountIT benchmarks broken
> 
>
> Key: BEAM-8799
> URL: https://issues.apache.org/jira/browse/BEAM-8799
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Luke Cwik
>Priority: Major
> Fix For: Not applicable
>
>
> This impacts beam_PerformanceTests_WordCountIT_Py27, 
> beam_PerformanceTests_WordCountIT_Py35, 
> beam_PerformanceTests_WordCountIT_Py36, beam_PerformanceTests_WordCountIT_Py37
> Example run: 
> [https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/762/console]
>  
> {code:java}
> 04:25:28 2019-11-21 12:25:28,226 fda01793 MainThread 
> beam_integration_benchmark(1/1) ERRORException running benchmark
> 04:25:28 Traceback (most recent call last):
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 995, in RunBenchmarkTask
> 04:25:28 RunBenchmark(spec, collector)
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 846, in RunBenchmark
> 04:25:28 DoRunPhase(spec, collector, detailed_timer)
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 689, in DoRunPhase
> 04:25:28 samples = spec.BenchmarkRun(spec)
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py",
>  line 161, in Run
> 04:25:28 job_type=job_type)
> 04:25:28 TypeError: SubmitJob() takes at least 3 arguments (5 given)
> 04:25:28 2019-11-21 12:25:28,226 fda01793 MainThread 
> beam_integration_benchmark(1/1) ERRORBenchmark 1/1 
> beam_integration_benchmark (UID: beam_integration_benchmark0) failed. 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8799) Python WordCountIT benchmarks broken

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles closed BEAM-8799.
-
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

> Python WordCountIT benchmarks broken
> 
>
> Key: BEAM-8799
> URL: https://issues.apache.org/jira/browse/BEAM-8799
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Luke Cwik
>Priority: Major
> Fix For: Not applicable
>
>
> This impacts beam_PerformanceTests_WordCountIT_Py27, 
> beam_PerformanceTests_WordCountIT_Py35, 
> beam_PerformanceTests_WordCountIT_Py36, beam_PerformanceTests_WordCountIT_Py37
> Example run: 
> [https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/762/console]
>  
> {code:java}
> 04:25:28 2019-11-21 12:25:28,226 fda01793 MainThread 
> beam_integration_benchmark(1/1) ERRORException running benchmark
> 04:25:28 Traceback (most recent call last):
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 995, in RunBenchmarkTask
> 04:25:28 RunBenchmark(spec, collector)
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 846, in RunBenchmark
> 04:25:28 DoRunPhase(spec, collector, detailed_timer)
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/pkb.py",
>  line 689, in DoRunPhase
> 04:25:28 samples = spec.BenchmarkRun(spec)
> 04:25:28   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_WordCountIT_Py27/PerfKitBenchmarker/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py",
>  line 161, in Run
> 04:25:28 job_type=job_type)
> 04:25:28 TypeError: SubmitJob() takes at least 3 arguments (5 given)
> 04:25:28 2019-11-21 12:25:28,226 fda01793 MainThread 
> beam_integration_benchmark(1/1) ERRORBenchmark 1/1 
> beam_integration_benchmark (UID: beam_integration_benchmark0) failed. 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9950) cannot find symbol javax.annotation.Generated

2020-05-15 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108755#comment-17108755
 ] 

Kyle Weaver commented on BEAM-9950:
---

[~iemejia] suggested this was because there's Java 11 somewhere, and I think 
that's likely to be the problem, but I haven't had time to look into it yet.

> cannot find symbol javax.annotation.Generated
> -
>
> Key: BEAM-9950
> URL: https://issues.apache.org/jira/browse/BEAM-9950
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: Minor
>
> This happens when I run through Intellij but not when I run the same command 
> on the command line, so it is presumably an issue with my Intellij setup. I 
> am using Intellij 2019.2.4.
> ./gradlew :runners:flink:1.10:test --tests 
> "org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission[*]"
> .../apache/beam/model/pipeline/build/generated/source/proto/main/grpc/org/apache/beam/model/pipeline/v1/TestStreamServiceGrpc.java:20:
>  error: cannot find symbol
> @javax.annotation.Generated(
>  ^
>   symbol:   class Generated
>   location: package javax.annotation
> 1 error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9984) Support BIT_OR aggregation function in Beam SQL

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9984:
--
Status: Open  (was: Triage Needed)

> Support BIT_OR aggregation function in Beam SQL
> ---
>
> Key: BEAM-9984
> URL: https://issues.apache.org/jira/browse/BEAM-9984
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: starter
>
> Performs a bitwise OR operation on expression and returns the result.
> Supported Argument Types: INT64
> Returned Data Types: INT64
> Examples
> {code:sql}
> SELECT BIT_OR(c) as bit_and FROM UNNEST([0xF001, 0x00A1]) as c;
> +-+
> | bit_and |
> +-+
> | 1   |
> +-+
> {code}
> What is expected: should include both Calcite and ZetaSQL dialects.
> How to test: unit tests
> Reference: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#bit_or



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9422) Missing comma in worker_pool_main causing syntax error whenever worker is started

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9422:
--
Status: Open  (was: Triage Needed)

> Missing comma in worker_pool_main causing syntax error whenever worker is 
> started
> -
>
> Key: BEAM-9422
> URL: https://issues.apache.org/jira/browse/BEAM-9422
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Frederik B
>Priority: Critical
>  Labels: newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Missing comma in worker_pool_main causing syntax error when python worker is 
> started, resulting in: 
> "Starting worker with command ['python', '-c', 'from 
> apache_beam.runners.worker.sdk_worker import SdkHarness; 
> SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run()']
>  File "", line 1
>  from apache_beam.runners.worker.sdk_worker import SdkHarness; 
> SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run()
>  ^
> SyntaxError: invalid syntax"
>  
> Should be trivial to fix; add comma to line 116 in worker_pool_main.py.
>  
> Note. I am not a Beam contributer, I am merely a user. I would appreciate if 
> anyone could fix this. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9321) BigQuery avro write logical type support

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9321:
--
Status: Open  (was: Triage Needed)

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Assignee: Filipe Regadas
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-9422) Missing comma in worker_pool_main causing syntax error whenever worker is started

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-9422.
---
Fix Version/s: Not applicable
 Assignee: Kyle Weaver
   Resolution: Fixed

> Missing comma in worker_pool_main causing syntax error whenever worker is 
> started
> -
>
> Key: BEAM-9422
> URL: https://issues.apache.org/jira/browse/BEAM-9422
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Frederik B
>Assignee: Kyle Weaver
>Priority: Critical
>  Labels: newbie
> Fix For: Not applicable
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Missing comma in worker_pool_main causing syntax error when python worker is 
> started, resulting in: 
> "Starting worker with command ['python', '-c', 'from 
> apache_beam.runners.worker.sdk_worker import SdkHarness; 
> SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run()']
>  File "", line 1
>  from apache_beam.runners.worker.sdk_worker import SdkHarness; 
> SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run()
>  ^
> SyntaxError: invalid syntax"
>  
> Should be trivial to fix; add comma to line 116 in worker_pool_main.py.
>  
> Note. I am not a Beam contributer, I am merely a user. I would appreciate if 
> anyone could fix this. Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-9321) BigQuery avro write logical type support

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-9321:
-

Assignee: Filipe Regadas

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Assignee: Filipe Regadas
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8328) remove :beam-test-infra-metrics:test from build target.

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108753#comment-17108753
 ] 

Kenneth Knowles commented on BEAM-8328:
---

[~Ardagan]


> remove :beam-test-infra-metrics:test from build target.
> ---
>
> Key: BEAM-8328
> URL: https://issues.apache.org/jira/browse/BEAM-8328
> Project: Beam
>  Issue Type: Bug
>  Components: community-metrics, project-management
>Reporter: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8328) remove :beam-test-infra-metrics:test from build target.

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108752#comment-17108752
 ] 

Kenneth Knowles commented on BEAM-8328:
---

Ping? Does the attached PR resolve this Jira or no?

> remove :beam-test-infra-metrics:test from build target.
> ---
>
> Key: BEAM-8328
> URL: https://issues.apache.org/jira/browse/BEAM-8328
> Project: Beam
>  Issue Type: Bug
>  Components: community-metrics, project-management
>Reporter: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9355) Python typehints: support NewType

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108751#comment-17108751
 ] 

Kenneth Knowles commented on BEAM-9355:
---

PR says "basic support". Is this still needing further work?

> Python typehints: support NewType
> -
>
> Key: BEAM-9355
> URL: https://issues.apache.org/jira/browse/BEAM-9355
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://docs.python.org/3/library/typing.html#newtype



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9221) Python precommit tests should catch errors when Python SDK dependency chain has conflicting dependencies.

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9221:
--
Status: Open  (was: Triage Needed)

> Python precommit tests should catch errors when Python SDK dependency chain 
> has conflicting dependencies. 
> --
>
> Key: BEAM-9221
> URL: https://issues.apache.org/jira/browse/BEAM-9221
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9355) Python typehints: support NewType

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9355:
--
Status: Open  (was: Triage Needed)

> Python typehints: support NewType
> -
>
> Key: BEAM-9355
> URL: https://issues.apache.org/jira/browse/BEAM-9355
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://docs.python.org/3/library/typing.html#newtype



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9321) BigQuery avro write logical type support

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108750#comment-17108750
 ] 

Kenneth Knowles commented on BEAM-9321:
---

I see the PR is merged. Has this fully resolved your issue?

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9239) Dependency conflict with Spark using aws io

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108749#comment-17108749
 ] 

Kenneth Knowles commented on BEAM-9239:
---

Does Spark 2.4.4 have the option where your dependencies/uber jar come first on 
the classpath? That could provide a workaround.

> Dependency conflict with Spark using aws io
> ---
>
> Key: BEAM-9239
> URL: https://issues.apache.org/jira/browse/BEAM-9239
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws, runner-spark
>Affects Versions: 2.17.0
>Reporter: David McIntosh
>Priority: Critical
>
> Starting with beam 2.17.0 I get this error in the Spark 2.4.4 driver when aws 
> io is also used:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.databind.jsontype.TypeSerializer.typeId(Ljava/lang/Object;Lcom/fasterxml/jackson/core/JsonToken;)Lcom/fasterxml/jackson/core/type/WritableTypeId;
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:163)
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:134)
>   at 
> com.fasterxml.jackson.databind.ser.impl.TypeWrappedSerializer.serialize(TypeWrappedSerializer.java:32)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:721)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:647)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:635)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:67)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.(SerializablePipelineOptions.java:43)
>   at 
> org.apache.beam.runners.spark.translation.EvaluationContext.(EvaluationContext.java:71)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:215)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:90)
> {noformat}
> The cause seems to be that the Spark driver environment uses an older version 
> of Jackson. I tried to update jackson on the Spark cluster but that led to 
> several other errors. 
> The change that started causing this was:
> https://github.com/apache/beam/commit/b68d70a47b68ad84efcd9405c1799002739bd116
> After reverting that change I was able to successfully run my job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9222) Add distributed read pattern

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9222:
--
Status: Open  (was: Triage Needed)

> Add distributed read pattern 
> -
>
> Key: BEAM-9222
> URL: https://issues.apache.org/jira/browse/BEAM-9222
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Reza ardeshir rokni
>Priority: Major
>
> A nice pattern that would be great to add to the Apache Beam patterns page:
> [https://nl.devoteam.com/en/blog-post/querying-jdbc-database-parallel-google-dataflow-apache-beam/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9239) Dependency conflict with Spark using aws io

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9239:
--
Status: Open  (was: Triage Needed)

> Dependency conflict with Spark using aws io
> ---
>
> Key: BEAM-9239
> URL: https://issues.apache.org/jira/browse/BEAM-9239
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws, runner-spark
>Affects Versions: 2.17.0
>Reporter: David McIntosh
>Priority: Major
>
> Starting with beam 2.17.0 I get this error in the Spark 2.4.4 driver when aws 
> io is also used:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.databind.jsontype.TypeSerializer.typeId(Ljava/lang/Object;Lcom/fasterxml/jackson/core/JsonToken;)Lcom/fasterxml/jackson/core/type/WritableTypeId;
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:163)
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:134)
>   at 
> com.fasterxml.jackson.databind.ser.impl.TypeWrappedSerializer.serialize(TypeWrappedSerializer.java:32)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:721)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:647)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:635)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:67)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.(SerializablePipelineOptions.java:43)
>   at 
> org.apache.beam.runners.spark.translation.EvaluationContext.(EvaluationContext.java:71)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:215)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:90)
> {noformat}
> The cause seems to be that the Spark driver environment uses an older version 
> of Jackson. I tried to update jackson on the Spark cluster but that led to 
> several other errors. 
> The change that started causing this was:
> https://github.com/apache/beam/commit/b68d70a47b68ad84efcd9405c1799002739bd116
> After reverting that change I was able to successfully run my job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9239) Dependency conflict with Spark using aws io

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9239:
--
Priority: Critical  (was: Major)

> Dependency conflict with Spark using aws io
> ---
>
> Key: BEAM-9239
> URL: https://issues.apache.org/jira/browse/BEAM-9239
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws, runner-spark
>Affects Versions: 2.17.0
>Reporter: David McIntosh
>Priority: Critical
>
> Starting with beam 2.17.0 I get this error in the Spark 2.4.4 driver when aws 
> io is also used:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.databind.jsontype.TypeSerializer.typeId(Ljava/lang/Object;Lcom/fasterxml/jackson/core/JsonToken;)Lcom/fasterxml/jackson/core/type/WritableTypeId;
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:163)
>   at 
> org.apache.beam.sdk.io.aws.options.AwsModule$AWSCredentialsProviderSerializer.serializeWithType(AwsModule.java:134)
>   at 
> com.fasterxml.jackson.databind.ser.impl.TypeWrappedSerializer.serialize(TypeWrappedSerializer.java:32)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:721)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:647)
>   at 
> org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:635)
>   at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:130)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3559)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2927)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.serializeToJson(SerializablePipelineOptions.java:67)
>   at 
> org.apache.beam.runners.core.construction.SerializablePipelineOptions.(SerializablePipelineOptions.java:43)
>   at 
> org.apache.beam.runners.spark.translation.EvaluationContext.(EvaluationContext.java:71)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:215)
>   at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:90)
> {noformat}
> The cause seems to be that the Spark driver environment uses an older version 
> of Jackson. I tried to update jackson on the Spark cluster but that led to 
> several other errors. 
> The change that started causing this was:
> https://github.com/apache/beam/commit/b68d70a47b68ad84efcd9405c1799002739bd116
> After reverting that change I was able to successfully run my job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9417) Unable to Read form BigQuery and File system in same pipeline

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108748#comment-17108748
 ] 

Kenneth Knowles commented on BEAM-9417:
---

[~chamikara] [~pabloem] [~robertwb]

> Unable to Read form BigQuery and File system in same pipeline
> -
>
> Key: BEAM-9417
> URL: https://issues.apache.org/jira/browse/BEAM-9417
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
> Environment: macbook pro cataline, python3.7, 
> apache-beam[gcp]===2.19.0
>Reporter: Deepak Verma
>Priority: Critical
>  Labels: bigquery, multiplexing
>
> I am trying to read from Bigquery and Local file system in my apache 
> beam[gcp] pipeline.
> {code:java}
> pipeline_options = PipelineOptions()
> options = pipeline_options.view_as(PreProcessOptions)
> options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=options)
> apn_query = "select * from `{bq_project}.config.apn` where 
> customer='{customer}'"\
>  .format(bq_project=options.bq_project, customer=options.customer)
> file_path = "mycsv.csv.gz"
> apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, 
> use_standard_sql=True))
> preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
> {code}
>  
> When I am running this job, I am getting below error
>  
> {code:java}
> Traceback (most recent call last):
>  File "/etl/dataflow/etlTXLPreprocessor.py", line 125, in 
>  run()
>  File "/etl/dataflow/etlTXLPreprocessor.py", line 120, in run
>  p.run().wait_until_finish()
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 461, in run
>  self._options).run(False)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 474, in run
>  return self.runner.run_pipeline(self, self._options)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 182, in run_pipeline
>  return runner.run_pipeline(pipeline, options)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 413, in run_pipeline
>  pipeline.replace_all(_get_transform_overrides(options))
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 443, in replace_all
>  self._replace(override)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 340, in _replace
>  self.visit(TransformUpdater(self))
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 503, in visit
>  self._root_transform().visit(visitor, self, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 939, in visit
>  part.visit(visitor, pipeline, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 939, in visit
>  part.visit(visitor, pipeline, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 939, in visit
>  part.visit(visitor, pipeline, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 942, in visit
>  visitor.visit_transform(self)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 338, in visit_transform
>  self._replace_if_needed(transform_node)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 301, in _replace_if_needed
>  new_output = replacement_transform.expand(input_node)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/sdf_direct_runner.py",
>  line 87, in expand
>  invoker = DoFnInvoker.create_invoker(signature, process_invocation=False)
>  File "apache_beam/runners/common.py", line 360, in 
> apache_beam.runners.common.DoFnInvoker.create_invoker
> TypeError: create_invoker() takes at least 2 positional arguments (1 
> given){code}
>  
> But If I run my code like this
> {code:java}
>  
> pipeline_options = PipelineOptions()
> options = pipeline_options.view_as(PreProcessOptions)
> options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=options)
> file_path = "mycsv.csv.gz"
> preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
> {code}
>  
> or like this
> {code:java}
>  
> pipeline_options = PipelineOptions()
> options = pipeline_options.view_as(PreProcessOptions)
> options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=options)
> apn_query = "select * from `{bq_project}.config.apn` where 
> customer='{customer}'"\
>  .format(bq_project=options.bq_project, customer=options.customer)
> apn = p |

[jira] [Updated] (BEAM-8919) Move JAVA_11_HOME and JAVA_8_HOME variables to Jenkins envs.

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8919:
--
Status: Open  (was: Triage Needed)

> Move JAVA_11_HOME and JAVA_8_HOME variables to Jenkins envs.
> 
>
> Key: BEAM-8919
> URL: https://issues.apache.org/jira/browse/BEAM-8919
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> Some tests that use different java versions rely on the following paths to 
> java home:
> final String JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64'
> final String JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64'
>  
> The paths itself should be held as jenkins env variables. Benefits: 
>  - easier to reuse
>  - no room for typo in the path
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9417) Unable to Read form BigQuery and File system in same pipeline

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9417:
--
Status: Open  (was: Triage Needed)

> Unable to Read form BigQuery and File system in same pipeline
> -
>
> Key: BEAM-9417
> URL: https://issues.apache.org/jira/browse/BEAM-9417
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
> Environment: macbook pro cataline, python3.7, 
> apache-beam[gcp]===2.19.0
>Reporter: Deepak Verma
>Priority: Critical
>  Labels: bigquery, multiplexing
>
> I am trying to read from Bigquery and Local file system in my apache 
> beam[gcp] pipeline.
> {code:java}
> pipeline_options = PipelineOptions()
> options = pipeline_options.view_as(PreProcessOptions)
> options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=options)
> apn_query = "select * from `{bq_project}.config.apn` where 
> customer='{customer}'"\
>  .format(bq_project=options.bq_project, customer=options.customer)
> file_path = "mycsv.csv.gz"
> apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, 
> use_standard_sql=True))
> preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
> {code}
>  
> When I am running this job, I am getting below error
>  
> {code:java}
> Traceback (most recent call last):
>  File "/etl/dataflow/etlTXLPreprocessor.py", line 125, in 
>  run()
>  File "/etl/dataflow/etlTXLPreprocessor.py", line 120, in run
>  p.run().wait_until_finish()
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 461, in run
>  self._options).run(False)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 474, in run
>  return self.runner.run_pipeline(self, self._options)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 182, in run_pipeline
>  return runner.run_pipeline(pipeline, options)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 413, in run_pipeline
>  pipeline.replace_all(_get_transform_overrides(options))
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 443, in replace_all
>  self._replace(override)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 340, in _replace
>  self.visit(TransformUpdater(self))
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 503, in visit
>  self._root_transform().visit(visitor, self, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 939, in visit
>  part.visit(visitor, pipeline, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 939, in visit
>  part.visit(visitor, pipeline, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 939, in visit
>  part.visit(visitor, pipeline, visited)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 942, in visit
>  visitor.visit_transform(self)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 338, in visit_transform
>  self._replace_if_needed(transform_node)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", 
> line 301, in _replace_if_needed
>  new_output = replacement_transform.expand(input_node)
>  File 
> "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/sdf_direct_runner.py",
>  line 87, in expand
>  invoker = DoFnInvoker.create_invoker(signature, process_invocation=False)
>  File "apache_beam/runners/common.py", line 360, in 
> apache_beam.runners.common.DoFnInvoker.create_invoker
> TypeError: create_invoker() takes at least 2 positional arguments (1 
> given){code}
>  
> But If I run my code like this
> {code:java}
>  
> pipeline_options = PipelineOptions()
> options = pipeline_options.view_as(PreProcessOptions)
> options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=options)
> file_path = "mycsv.csv.gz"
> preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
> {code}
>  
> or like this
> {code:java}
>  
> pipeline_options = PipelineOptions()
> options = pipeline_options.view_as(PreProcessOptions)
> options.view_as(SetupOptions).save_main_session = True
> p = beam.Pipeline(options=options)
> apn_query = "select * from `{bq_project}.config.apn` where 
> customer='{customer}'"\
>  .format(bq_project=options.bq_project, customer=options.customer)
> apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, 
>

[jira] [Updated] (BEAM-8485) Python Beam job is successful regardless of the result of the BigQuery load job (for the BigQuery File Loads)

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8485:
--
Status: Open  (was: Triage Needed)

> Python Beam job is successful regardless of the result of the BigQuery load 
> job (for the BigQuery File Loads)
> -
>
> Key: BEAM-8485
> URL: https://issues.apache.org/jira/browse/BEAM-8485
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.16.0
>Reporter: Robbie Gruener
>Priority: Major
>
> We are using the beam python sdk 2.16.0 and have noticed since upgrading to 
> 2.16.0 that beam jobs are exiting successfully even though the BQ load job 
> fails. We can see the load job failing in the Bigquery UI while the beam jobs 
> running on either the DirectRunner or the DataflowRunner were successful.
> Note we are using the BigQuery File Loads (which is enabled via 
> use_beam_bq_sink)
> I have verified that downgrading to 2.15.0 fixes this issue so it must be a 
> newly introduced bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9431:
--
Status: Open  (was: Triage Needed)

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108747#comment-17108747
 ] 

Kenneth Knowles commented on BEAM-9431:
---

Already addressed by the linked PR?

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9567) GCS needs to be testable

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108746#comment-17108746
 ] 

Kenneth Knowles commented on BEAM-9567:
---

In practice, our tests use Mockito to directly create a mock GcsUtil: 
https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/extensions/google-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java#L150

What you are describing, I think, would be more typically known as a "fake": a 
functional but not full scale version of the API for a service. Fakes are 
usually better for testing than mocks.

> GCS needs to be testable
> 
>
> Key: BEAM-9567
> URL: https://issues.apache.org/jira/browse/BEAM-9567
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: LCID Fire
>Priority: Minor
>
> Currently AFAIK, there is no way of testing `saveAsTextFile` on GCS without 
> actually connecting.
> For testing it would be good to have a way of using a mock GCS implementation.
> Wonder whether `LocalStorageHelper` might be a base for a FileSystem 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9487) GBKs on unbounded pcolls with global windows and no triggers should fail

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9487:
--
Status: Open  (was: Triage Needed)

> GBKs on unbounded pcolls with global windows and no triggers should fail
> 
>
> Key: BEAM-9487
> URL: https://issues.apache.org/jira/browse/BEAM-9487
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Major
>  Labels: EaseOfUse, starter
>
> This, according to "4.2.2.1 GroupByKey and unbounded PCollections" in 
> https://beam.apache.org/documentation/programming-guide/.
> bq. If you do apply GroupByKey or CoGroupByKey to a group of unbounded 
> PCollections without setting either a non-global windowing strategy, a 
> trigger strategy, or both for each collection, Beam generates an 
> IllegalStateException error at pipeline construction time.
> Example where this doesn't happen in Python SDK: 
> https://stackoverflow.com/questions/60623246/merge-pcollection-with-apache-beam
> I also believe that this unit test should fail, since test_stream is 
> unbounded, uses global window, and has no triggers.
> {code}
>   def test_global_window_gbk_fail(self):
> with TestPipeline() as p:
>   test_stream = TestStream()
>   _ = p | test_stream | GroupByKey()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9567) GCS needs to be testable

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9567:
--
Status: Open  (was: Triage Needed)

> GCS needs to be testable
> 
>
> Key: BEAM-9567
> URL: https://issues.apache.org/jira/browse/BEAM-9567
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: LCID Fire
>Priority: Minor
>
> Currently AFAIK, there is no way of testing `saveAsTextFile` on GCS without 
> actually connecting.
> For testing it would be good to have a way of using a mock GCS implementation.
> Wonder whether `LocalStorageHelper` might be a base for a FileSystem 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9567) GCS needs to be testable

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9567:
--
Status: Triage Needed  (was: Open)

> GCS needs to be testable
> 
>
> Key: BEAM-9567
> URL: https://issues.apache.org/jira/browse/BEAM-9567
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: LCID Fire
>Priority: Minor
>
> Currently AFAIK, there is no way of testing `saveAsTextFile` on GCS without 
> actually connecting.
> For testing it would be good to have a way of using a mock GCS implementation.
> Wonder whether `LocalStorageHelper` might be a base for a FileSystem 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9597) Autodetect Avro schema from Avro file

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9597:
--
Priority: Major  (was: Minor)

> Autodetect Avro schema from Avro file
> -
>
> Key: BEAM-9597
> URL: https://issues.apache.org/jira/browse/BEAM-9597
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-avro
>Reporter: Lars Almgren
>Priority: Major
>
> I'm building a pipeline where I'm reading a set of Avro files, all with same 
> schema. These Avro contains the schema definition. To get GenericRecords, I 
> today have to specify the schema for the files in my pipeline. It would be 
> very neat to use the fact that my Avro files contains the schema definition 
> already. Something like 
>  AvroIO.readFilesGenericRecordsParseSchemaFromSource().
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9597) Autodetect Avro schema from Avro file

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9597:
--
Status: Open  (was: Triage Needed)

> Autodetect Avro schema from Avro file
> -
>
> Key: BEAM-9597
> URL: https://issues.apache.org/jira/browse/BEAM-9597
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-avro
>Reporter: Lars Almgren
>Priority: Major
>
> I'm building a pipeline where I'm reading a set of Avro files, all with same 
> schema. These Avro contains the schema definition. To get GenericRecords, I 
> today have to specify the schema for the files in my pipeline. It would be 
> very neat to use the fact that my Avro files contains the schema definition 
> already. Something like 
>  AvroIO.readFilesGenericRecordsParseSchemaFromSource().
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108744#comment-17108744
 ] 

Kenneth Knowles commented on BEAM-9361:
---

CC [~reuvenlax] [~bhulette] 

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: Major
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.java:261)
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>   at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:581)
>   at 
>

[jira] [Commented] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108743#comment-17108743
 ] 

Kenneth Knowles commented on BEAM-9361:
---

I like string

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: Major
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.java:261)
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>   at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:581)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toCalciteRowType(CalciteUtils.java:258)
>

[jira] [Work logged] (BEAM-10015) output timestamp not properly propagated through the Dataflow runner

2020-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-10015?focusedWorklogId=433966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433966
 ]

ASF GitHub Bot logged work on BEAM-10015:
-

Author: ASF GitHub Bot
Created on: 15/May/20 23:19
Start Date: 15/May/20 23:19
Worklog Time Spent: 10m 
  Work Description: rehmanmuradali commented on pull request #11725:
URL: https://github.com/apache/beam/pull/11725#issuecomment-629545286


   LGTM. Thanks for taking care of it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433966)
Time Spent: 0.5h  (was: 20m)

> output timestamp not properly propagated through the Dataflow runner
> 
>
> Key: BEAM-10015
> URL: https://issues.apache.org/jira/browse/BEAM-10015
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Dataflow runner does not propagate the output timestamp into timer firing, 
> resulting in incorrect default timestamps when outputting from a processTimer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9361) NPE When putting Avro record with enum through SqlTransform

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9361:
--
Status: Open  (was: Triage Needed)

> NPE When putting Avro record with enum through SqlTransform
> ---
>
> Key: BEAM-9361
> URL: https://issues.apache.org/jira/browse/BEAM-9361
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.19.0
>Reporter: Niels Basjes
>Priority: Major
>
> I ran into this problem when trying to put my Avro records through the 
> SqlTransform.
> I was able to reduce the reproduction path to the code below.
> This code fails on my machine (using Beam 2.19.0) with the following 
> NullPointerException
> {code:java}
>  org.apache.beam.sdk.extensions.sql.impl.ParseException: Unable to parse 
> query SELECT name, direction FROM InputStreamat 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:175)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:125)
>   at 
> org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:83)
>   at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:539)
>   at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:490)
>   at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:261)
>   at com.bol.analytics.m2.TestAvro2SQL.testAvro2SQL(TestAvro2SQL.java:99)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>   at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>   at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
> Caused by: 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.ValidationException:
>  java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:217)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner.convertToBeamRel(CalciteQueryPlanner.java:144)
>   ... 31 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:45)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:280)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toRelDataType(CalciteUtils.java:287)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.lambda$toCalciteRowType$0(CalciteUtils.java:261)
>   at 
> java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
>   at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:581)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils.toCalciteRowType(CalciteUtils.java:258)
>   at 
>

[jira] [Updated] (BEAM-9568) Move Beam SQL to use the schema join transforms

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9568:
--
Status: Open  (was: Triage Needed)

> Move Beam SQL to use the schema join transforms
> ---
>
> Key: BEAM-9568
> URL: https://issues.apache.org/jira/browse/BEAM-9568
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9568) Move Beam SQL to use the schema join transforms

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9568:
--
Component/s: (was: sdk-java-core)
 dsl-sql

> Move Beam SQL to use the schema join transforms
> ---
>
> Key: BEAM-9568
> URL: https://issues.apache.org/jira/browse/BEAM-9568
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Reuven Lax
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9568) Move Beam SQL to use the schema join transforms

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9568:
--
Parent: (was: BEAM-4076)
Issue Type: Improvement  (was: Sub-task)

> Move Beam SQL to use the schema join transforms
> ---
>
> Key: BEAM-9568
> URL: https://issues.apache.org/jira/browse/BEAM-9568
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Reuven Lax
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9939) Make TupleCombineFn play well with NamedTuples

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9939:
--
Status: Open  (was: Triage Needed)

> Make TupleCombineFn play well with NamedTuples
> --
>
> Key: BEAM-9939
> URL: https://issues.apache.org/jira/browse/BEAM-9939
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>
> Given an input of NamedTuples, it should produce an output of a 
> parameterizable type. 
> (We could consider supporting data classes as well.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9950) cannot find symbol javax.annotation.Generated

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108740#comment-17108740
 ] 

Kenneth Knowles commented on BEAM-9950:
---

JDK version?

> cannot find symbol javax.annotation.Generated
> -
>
> Key: BEAM-9950
> URL: https://issues.apache.org/jira/browse/BEAM-9950
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: Minor
>
> This happens when I run through Intellij but not when I run the same command 
> on the command line, so it is presumably an issue with my Intellij setup. I 
> am using Intellij 2019.2.4.
> ./gradlew :runners:flink:1.10:test --tests 
> "org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest.testWatermarkEmission[*]"
> .../apache/beam/model/pipeline/build/generated/source/proto/main/grpc/org/apache/beam/model/pipeline/v1/TestStreamServiceGrpc.java:20:
>  error: cannot find symbol
> @javax.annotation.Generated(
>  ^
>   symbol:   class Generated
>   location: package javax.annotation
> 1 error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9955) Cleanup conditional dependence on grpc.

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9955:
--
Status: Open  (was: Triage Needed)

> Cleanup conditional dependence on grpc.
> ---
>
> Key: BEAM-9955
> URL: https://issues.apache.org/jira/browse/BEAM-9955
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9917) BigQueryBatchFileLoads dynamic destination

2020-05-15 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108737#comment-17108737
 ] 

Kenneth Knowles commented on BEAM-9917:
---

CC [~chamikara]

> BigQueryBatchFileLoads dynamic destination
> --
>
> Key: BEAM-9917
> URL: https://issues.apache.org/jira/browse/BEAM-9917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.17.0
>Reporter: Tord Sætren
>Priority: Critical
>  Labels: beginner, documentation
>
> I am trying to use BigQueryBatchFileLoads to upload data from pubsub. It 
> works fine for a single table, but when I try to use a dynamic destination 
> such as
> destination=lambda elem: "my_project:my_dataset." + elem["sensor_key"],
> it just makes a new table for each time the triggering_frequency procs. I 
> know it makes temporary tables before loading it all into one, but it never 
> loads them. It just creates more and more tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9917) BigQueryBatchFileLoads dynamic destination

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9917:
--
Priority: Critical  (was: Minor)

> BigQueryBatchFileLoads dynamic destination
> --
>
> Key: BEAM-9917
> URL: https://issues.apache.org/jira/browse/BEAM-9917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.17.0
>Reporter: Tord Sætren
>Priority: Critical
>  Labels: beginner, documentation
>
> I am trying to use BigQueryBatchFileLoads to upload data from pubsub. It 
> works fine for a single table, but when I try to use a dynamic destination 
> such as
> destination=lambda elem: "my_project:my_dataset." + elem["sensor_key"],
> it just makes a new table for each time the triggering_frequency procs. I 
> know it makes temporary tables before loading it all into one, but it never 
> loads them. It just creates more and more tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9917) BigQueryBatchFileLoads dynamic destination

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9917:
--
Component/s: (was: sdk-py-core)

> BigQueryBatchFileLoads dynamic destination
> --
>
> Key: BEAM-9917
> URL: https://issues.apache.org/jira/browse/BEAM-9917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.17.0
>Reporter: Tord Sætren
>Priority: Minor
>  Labels: beginner, documentation
>
> I am trying to use BigQueryBatchFileLoads to upload data from pubsub. It 
> works fine for a single table, but when I try to use a dynamic destination 
> such as
> destination=lambda elem: "my_project:my_dataset." + elem["sensor_key"],
> it just makes a new table for each time the triggering_frequency procs. I 
> know it makes temporary tables before loading it all into one, but it never 
> loads them. It just creates more and more tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9917) BigQueryBatchFileLoads dynamic destination

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9917:
--
Component/s: io-py-gcp

> BigQueryBatchFileLoads dynamic destination
> --
>
> Key: BEAM-9917
> URL: https://issues.apache.org/jira/browse/BEAM-9917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, sdk-py-core
>Affects Versions: 2.17.0
>Reporter: Tord Sætren
>Priority: Minor
>  Labels: beginner, documentation
>
> I am trying to use BigQueryBatchFileLoads to upload data from pubsub. It 
> works fine for a single table, but when I try to use a dynamic destination 
> such as
> destination=lambda elem: "my_project:my_dataset." + elem["sensor_key"],
> it just makes a new table for each time the triggering_frequency procs. I 
> know it makes temporary tables before loading it all into one, but it never 
> loads them. It just creates more and more tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9917) BigQueryBatchFileLoads dynamic destination

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9917:
--
Status: Open  (was: Triage Needed)

> BigQueryBatchFileLoads dynamic destination
> --
>
> Key: BEAM-9917
> URL: https://issues.apache.org/jira/browse/BEAM-9917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.17.0
>Reporter: Tord Sætren
>Priority: Critical
>  Labels: beginner, documentation
>
> I am trying to use BigQueryBatchFileLoads to upload data from pubsub. It 
> works fine for a single table, but when I try to use a dynamic destination 
> such as
> destination=lambda elem: "my_project:my_dataset." + elem["sensor_key"],
> it just makes a new table for each time the triggering_frequency procs. I 
> know it makes temporary tables before loading it all into one, but it never 
> loads them. It just creates more and more tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9965) Explain direct runner running modes

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9965:
--
Status: Open  (was: Triage Needed)

> Explain direct runner running modes
> ---
>
> Key: BEAM-9965
> URL: https://issues.apache.org/jira/browse/BEAM-9965
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Priority: Major
>
> We should add reasoning for which direct_running_mode a user should choose.
> https://beam.apache.org/documentation/runners/direct/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9965) Explain direct runner running modes

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9965:
--
Priority: Major  (was: Minor)

> Explain direct runner running modes
> ---
>
> Key: BEAM-9965
> URL: https://issues.apache.org/jira/browse/BEAM-9965
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Kyle Weaver
>Priority: Major
>
> We should add reasoning for which direct_running_mode a user should choose.
> https://beam.apache.org/documentation/runners/direct/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9970) beam_PostRelease_NightlySnapshot failing

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9970:
--
Status: Open  (was: Triage Needed)

> beam_PostRelease_NightlySnapshot failing
> 
>
> Key: BEAM-9970
> URL: https://issues.apache.org/jira/browse/BEAM-9970
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Critical
>  Labels: currently-failing
>
> 07:44:55 gcloud dataflow jobs cancel $(gcloud dataflow jobs list | grep 
> leaderboard-validation-1588332285874-332 | grep Running | cut -d' ' -f1)
> 07:44:59 ERROR: (gcloud.dataflow.jobs.cancel) argument JOB_ID [JOB_ID ...]: 
> Must be specified.
> 07:44:59 Usage: gcloud dataflow jobs cancel JOB_ID [JOB_ID ...] [optional 
> flags]
> 07:44:59   optional flags may be  --help | --region
> 07:44:59 
> 07:44:59 For detailed information on this command and its flags, run:
> 07:44:59   gcloud dataflow jobs cancel --help
> 07:44:59 ERROR: (gcloud.dataflow.jobs.cancel) argument JOB_ID [JOB_ID ...]: 
> Must be specified.
> 07:44:59 Usage: gcloud dataflow jobs cancel JOB_ID [JOB_ID ...] [optional 
> flags]
> 07:44:59   optional flags may be  --help | --region
> 07:44:59 
> 07:44:59 For detailed information on this command and its flags, run:
> 07:44:59   gcloud dataflow jobs cancel --help
> 07:45:00 [ERROR] Failed command
> 07:45:00 
> 07:45:00 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow FAILED



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9760) KafkaIO supports consumer group?

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9760:
--
Status: Open  (was: Triage Needed)

> KafkaIO supports consumer group?
> 
>
> Key: BEAM-9760
> URL: https://issues.apache.org/jira/browse/BEAM-9760
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Ka Wah WONG
>Priority: Major
>
> It seems only assign method of Kafka Consumer class is called in 
> org.apache.beam.sdk.io.kafka.ConsumerSpEL class. According to documentation 
> of org.apache.kafka.clients.consumer.KafkaConsumer,  manual topic assignment 
> through this assign method does not use the consumer's group management 
> functionality.
> May I ask if KafkaIO will be enhanced to support consumer's group management 
> with using Kafka consumer's subscribe method?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9760) KafkaIO supports consumer group?

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9760:
--
Priority: Major  (was: Minor)

> KafkaIO supports consumer group?
> 
>
> Key: BEAM-9760
> URL: https://issues.apache.org/jira/browse/BEAM-9760
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Ka Wah WONG
>Priority: Major
>
> It seems only assign method of Kafka Consumer class is called in 
> org.apache.beam.sdk.io.kafka.ConsumerSpEL class. According to documentation 
> of org.apache.kafka.clients.consumer.KafkaConsumer,  manual topic assignment 
> through this assign method does not use the consumer's group management 
> functionality.
> May I ask if KafkaIO will be enhanced to support consumer's group management 
> with using Kafka consumer's subscribe method?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9973) beam_PostCommit_Py_ValCont flakes (before running tests)

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9973:
--
Status: Open  (was: Triage Needed)

> beam_PostCommit_Py_ValCont flakes (before running tests)
> 
>
> Key: BEAM-9973
> URL: https://issues.apache.org/jira/browse/BEAM-9973
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Critical
>  Labels: flake
>
> 14:05:15 - Last  20 lines from daemon log file - daemon-2143.out.log -
> 14:05:15 INFO:gen_protos:Writing urn stubs: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/portability/api/metrics_pb2_urns.py
> 14:05:15 INFO:gen_protos:Writing urn stubs: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/portability/api/beam_artifact_api_pb2_urns.py
> 14:05:15 INFO:gen_protos:Writing urn stubs: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/portability/api/standard_window_fns_pb2_urns.py
> 14:05:15 INFO:gen_protos:Writing urn stubs: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/portability/api/beam_fn_api_pb2_urns.py
> 14:05:15 INFO:gen_protos:Writing urn stubs: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/portability/api/beam_job_api_pb2_urns.py
> 14:05:15 INFO:gen_protos:Writing urn stubs: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/apache_beam/portability/api/beam_runner_api_pb2_urns.py
> 14:05:15 warning: no files found matching 'README.md'
> 14:05:15 warning: no files found matching 'NOTICE'
> 14:05:15 warning: no files found matching 'LICENSE'
> 14:05:15 warning: sdist: standard file not found: should have one of README, 
> README.rst, README.txt, README.md
> 14:05:15 
> 14:05:15 Create distribution tar file apache-beam.tar.gz in 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Py_ValCont/src/sdks/python/build
> 14:05:15 :sdks:python:sdist (Thread[Execution worker for ':',5,main]) 
> completed. Took 7.453 secs.
> 14:05:15 :sdks:python:container:py2:copyDockerfileDependencies 
> (Thread[Execution worker for ':',5,main]) started.
> 14:05:15 Build cache key for task 
> ':sdks:python:container:py2:copyDockerfileDependencies' is 
> ea7f5d2ce156f4297b3b2c5c3f21611d
> 14:05:15 Caching disabled for task 
> ':sdks:python:container:py2:copyDockerfileDependencies': Caching has not been 
> enabled for the task
> 14:05:15 Task ':sdks:python:container:py2:copyDockerfileDependencies' is not 
> up-to-date because:
> 14:05:15   No history is available.
> 14:05:15 :sdks:python:container:py2:copyDockerfileDependencies 
> (Thread[Execution worker for ':',5,main]) completed. Took 0.012 secs.
> 14:05:15 Daemon vm is shutting down... The daemon has exited normally or was 
> terminated in response to a user interrupt.
> 14:05:15 - End of the daemon log -
> 14:05:15 
> 14:05:15 
> 14:05:15 FAILURE: Build failed with an exception.
> 14:05:15 
> 14:05:15 * What went wrong:
> 14:05:15 Gradle build daemon disappeared unexpectedly (it may have been 
> killed or may have crashed)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9970) beam_PostRelease_NightlySnapshot failing

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9970:
--
Priority: Critical  (was: Major)

> beam_PostRelease_NightlySnapshot failing
> 
>
> Key: BEAM-9970
> URL: https://issues.apache.org/jira/browse/BEAM-9970
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Critical
>  Labels: currently-failing
>
> 07:44:55 gcloud dataflow jobs cancel $(gcloud dataflow jobs list | grep 
> leaderboard-validation-1588332285874-332 | grep Running | cut -d' ' -f1)
> 07:44:59 ERROR: (gcloud.dataflow.jobs.cancel) argument JOB_ID [JOB_ID ...]: 
> Must be specified.
> 07:44:59 Usage: gcloud dataflow jobs cancel JOB_ID [JOB_ID ...] [optional 
> flags]
> 07:44:59   optional flags may be  --help | --region
> 07:44:59 
> 07:44:59 For detailed information on this command and its flags, run:
> 07:44:59   gcloud dataflow jobs cancel --help
> 07:44:59 ERROR: (gcloud.dataflow.jobs.cancel) argument JOB_ID [JOB_ID ...]: 
> Must be specified.
> 07:44:59 Usage: gcloud dataflow jobs cancel JOB_ID [JOB_ID ...] [optional 
> flags]
> 07:44:59   optional flags may be  --help | --region
> 07:44:59 
> 07:44:59 For detailed information on this command and its flags, run:
> 07:44:59   gcloud dataflow jobs cancel --help
> 07:45:00 [ERROR] Failed command
> 07:45:00 
> 07:45:00 > Task 
> :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow FAILED



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-9972) Test pure SQL UDF for all data types

2020-05-15 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9972:
--
Status: Open  (was: Triage Needed)

> Test pure SQL UDF for all data types
> 
>
> Key: BEAM-9972
> URL: https://issues.apache.org/jira/browse/BEAM-9972
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> The following are data types that UDF will support
> * ARRAY
> * BOOL
> * BYTES
> * DATE
> * TIME
> * FLOAT64
> * INT64
> * NUMERIC
> * STRING
> * STRUCT
> * TIMESTAMP
> We should write unit tests to against these types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 >

1 - 100 of 329 matches

Mail list logo