[jira] [Created] (BEAM-7998) MatchesFiles or MatchAll seems to return seveval time the same element

2019-08-16 Thread Jerome MASSOT (JIRA)
Jerome MASSOT created BEAM-7998:
---

 Summary: MatchesFiles or MatchAll seems to return seveval time the 
same element
 Key: BEAM-7998
 URL: https://issues.apache.org/jira/browse/BEAM-7998
 Project: Beam
  Issue Type: Bug
  Components: beam-model
Affects Versions: 2.14.0
 Environment: GCP for storage, DirectRunner and DataflowRunner both 
have the problem. PyCharm on Win10 for IDE and dev environment.
Reporter: Jerome MASSOT


Hi team,

when I use MatcheFiles using wildcard and files located in a GCP bucket, the 
MatcheFiles transform returns several times (at least 2) the same file.

I have tried to follow the stack, and I can see that the MatchesAll is called 
twice when I run the pipeline on a debug project where a single element is 
present in the bucket.

But I am not good enough to say more than that. Sorry.

Best regards

Jerome



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296685
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 17/Aug/19 01:00
Start Date: 17/Aug/19 01:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9364: BEAM-7995: 
python PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#discussion_r314927714
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator):
 if windows is 0:
   self.output(_globally_windowed_value.with_value((key, value)))
 else:
-  self.output(WindowedValue((key, value), windows[0].end, windows))
+  self.output(WindowedValue((key, value), windows[0].max_timestamp(), 
windows))
 
 Review comment:
   You are right. I read this as just 'max_timestamp()', did not realize it is 
the max timestamp for the window. Your change looks correct.
   
   Referencing your email 
(https://lists.apache.org/thread.html/ad42c55ed0212cb18b2b29bfc3dddfb47b8cb9f3358583775d0da37b@%3Cdev.beam.apache.org%3E),
 this needs to be a shared definition between Python and Java. @lukecwik could 
you confirm that an element in a window should have a timestamp < window.end or 
could it be <= window.end ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296685)
Time Spent: 50m  (was: 40m)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest times

[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296683&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296683
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:33
Start Date: 17/Aug/19 00:33
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522188388
 
 
   run python 3.6 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296683)
Time Spent: 11h 50m  (was: 11h 40m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=296679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296679
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:22
Start Date: 17/Aug/19 00:22
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest 
testing infrastructure
URL: https://github.com/apache/beam/pull/7949#issuecomment-522187314
 
 
   Still working
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296679)
Time Spent: 3h 50m  (was: 3h 40m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=296680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296680
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:22
Start Date: 17/Aug/19 00:22
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest 
testing infrastructure
URL: https://github.com/apache/beam/pull/7949#issuecomment-522187314
 
 
   no disassemble
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296680)
Time Spent: 4h  (was: 3h 50m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296678
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:15
Start Date: 17/Aug/19 00:15
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314924560
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
 
 Review comment:
   Is the new explanation in the [latest 
commit](https://github.com/apache/beam/pull/9188/commits/25dcc50a8de9c607069a8efc80a6da67a6e8b0ca)
 better?
   
   My intention was to indicate that "column additions/deletions" are the 
supported schema changes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296678)
Time Spent: 3h 40m  (was: 3.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296677
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:12
Start Date: 17/Aug/19 00:12
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314924082
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions.go
 ##
 @@ -40,6 +40,8 @@ func FunctionName(fn interface{}) string {
 // points to a valid function implementation.
 func LoadFunction(ptr uintptr, t reflect.Type) interface{} {
v := reflect.New(t).Elem()
-   *(*uintptr)(unsafe.Pointer(v.Addr().Pointer())) = 
(uintptr)(unsafe.Pointer(&ptr))
+   p := new(uintptr)
+   *p = ptr
+   *(*unsafe.Pointer)(unsafe.Pointer(v.Addr().Pointer())) = 
unsafe.Pointer(p)
 
 Review comment:
   Thanks for the nice summary!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296677)
Time Spent: 2h  (was: 1h 50m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296674
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:04
Start Date: 17/Aug/19 00:04
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314923417
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
+   val := reflect.ValueOf(testFunction)
+   fi := uintptr(val.Pointer())
+   typ := val.Type()
+
+   callable := LoadFunction(fi, typ)
+
+   cv := reflect.ValueOf(callable)
+   out := cv.Call(nil)
+   if len(out) != 1 {
+   t.Errorf("got %d return values, wanted 1.", len(out))
+   }
+   // TODO: check type?
+   if out[0].Int() != 42 {
+   t.Errorf("got %d, wanted 42", out[0].Int())
+   }
 
 Review comment:
   Good point, will change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296674)
Time Spent: 1h 50m  (was: 1h 40m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296675
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:04
Start Date: 17/Aug/19 00:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9365: [BEAM-7924] fix 
pipeline options error with main session
URL: https://github.com/apache/beam/pull/9365#issuecomment-522185355
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296675)
Time Spent: 4h 10m  (was: 4h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296673
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:04
Start Date: 17/Aug/19 00:04
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314923396
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions.go
 ##
 @@ -40,6 +40,8 @@ func FunctionName(fn interface{}) string {
 // points to a valid function implementation.
 func LoadFunction(ptr uintptr, t reflect.Type) interface{} {
v := reflect.New(t).Elem()
-   *(*uintptr)(unsafe.Pointer(v.Addr().Pointer())) = 
(uintptr)(unsafe.Pointer(&ptr))
+   p := new(uintptr)
+   *p = ptr
+   *(*unsafe.Pointer)(unsafe.Pointer(v.Addr().Pointer())) = 
unsafe.Pointer(p)
 
 Review comment:
   The problem was that the original code relies on the behavior that when &ptr 
is evaluated, the resulting anonymous value will be on the heap. The new 
compiler is able to not put it on the heap, and does so, which causes this code 
to crash in Go 1.13.
   
   This change makes the expected behavior explicit by allocating p on the heap 
and then assigning it's pointed-to value to ptr, thus p == &ptr and the 
remaining code works as before.
   
   The unit test exercises this function. I verified that the old code crashed 
under Go 1.13 (worked in Go 1.12) and the new code works in both versions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296673)
Time Spent: 1h 40m  (was: 1.5h)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=296672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296672
 ]

ASF GitHub Bot logged work on BEAM-7970:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:01
Start Date: 17/Aug/19 00:01
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9367: [BEAM-7970] 
Regenerate Go SDK protos in correct version.
URL: https://github.com/apache/beam/pull/9367
 
 
   Regenerating protos such that they actually get generated as proto3,
   which is how they should've been for a while.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastC

[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296671
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:00
Start Date: 17/Aug/19 00:00
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314922876
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
 
 Review comment:
   I thought the last commit fixed this, but clearly not. Will fix.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296671)
Time Spent: 1.5h  (was: 1h 20m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=296668&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296668
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:45
Start Date: 16/Aug/19 23:45
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9351: [BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#discussion_r314920513
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -84,16 +84,10 @@ def __init__(self):
   @staticmethod
   def default_docker_image():
 if 'USER' in os.environ:
-  if sys.version_info[0] == 2:
-version_suffix = ''
-  elif sys.version_info[0:2] == (3, 5):
-version_suffix = '3'
-  else:
-version_suffix = '3'
-# TODO(BEAM-7474): Use an image which has correct Python minor version.
-logging.warning('Make sure that locally built Python SDK docker image '
-'has Python %d.%d interpreter. See also: BEAM-7474.' % 
(
-sys.version_info[0], sys.version_info[1]))
+  version_suffix = ''.join([str(i) for i in sys.version_info[0:2]])
 
 Review comment:
   This changes the behavior for python 2. I agree with the change, let's make 
sure that we test it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296668)
Time Spent: 2h 10m  (was: 2h)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=296667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296667
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:45
Start Date: 16/Aug/19 23:45
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9351: [BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#discussion_r314921003
 
 

 ##
 File path: sdks/python/container/base_image_requirements.txt
 ##
 @@ -67,7 +67,7 @@ pandas==0.23.4
 protorpc==0.11.1
 python-gflags==3.0.6
 setuptools<=39.1.0 # requirement for Tensorflow.
-tensorflow==1.11.0
+tensorflow==1.13.1
 
 Review comment:
   Could you upgrade these dependencies to match (a) setup.py (latest in the 
range if possible) and (b) 
https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies
   
   We should probably try to figure out a way to keep version in sync. This was 
recently discussed in the mailing list. For ideas you can look at that 
(https://lists.apache.org/thread.html/e174967bab6c224db40f66af74e7d07466812cc6b78189e55da88cb0@%3Cdev.beam.apache.org%3E)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296667)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=29&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-29
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:45
Start Date: 16/Aug/19 23:45
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9351: [BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#discussion_r314920538
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -84,16 +84,10 @@ def __init__(self):
   @staticmethod
   def default_docker_image():
 if 'USER' in os.environ:
-  if sys.version_info[0] == 2:
-version_suffix = ''
-  elif sys.version_info[0:2] == (3, 5):
-version_suffix = '3'
-  else:
-version_suffix = '3'
-# TODO(BEAM-7474): Use an image which has correct Python minor version.
 
 Review comment:
   You can probably close this issue as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 29)
Time Spent: 2h  (was: 1h 50m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-08-16 Thread Daniel Oliveira (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7970 started by Daniel Oliveira.
-
> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296664
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:43
Start Date: 16/Aug/19 23:43
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883
 
 
   Here's some info on where I am with this.  I could really use some help to 
push this over the finish line.
   
   The expansion service runs correctly, sends the expanded transforms back to 
python, but the job fails inside Java on Flink because it's trying to use the 
incorrect serializer.  There's a good chance that I'm overlooking something 
very obvious.
   
   Here's the stack trace:
   
   ```
   2019-08-16 16:04:58,297 INFO  org.apache.flink.runtime.taskmanager.Task  
   - Source: 
PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource)
 (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED.
   java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage 
cannot be cast to [B
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at 
org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.ClassCastException: 
org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B
at 
org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
at 
org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46)
at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 15 more
   ```
   
   Here's the serializer that `CoderTypeSerialize

[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296665&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296665
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:43
Start Date: 16/Aug/19 23:43
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883
 
 
   Here's some info on where I am with this.  I could really use some help to 
push this over the finish line.
   
   The expansion service runs correctly, sends the expanded transforms back to 
python, but the job fails inside Java on Flink because it's trying to use the 
incorrect serializer.  There's a good chance that I'm overlooking something 
very obvious.
   
   Here's the stack trace:
   
   ```
   2019-08-16 16:04:58,297 INFO  org.apache.flink.runtime.taskmanager.Task  
   - Source: 
PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource)
 (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED.
   java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage 
cannot be cast to [B
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at 
org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.ClassCastException: 
org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B
at 
org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
at 
org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46)
at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 15 more
   ```
   
   Here's the serializer that `CoderTypeSerialize

[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296663
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:40
Start Date: 16/Aug/19 23:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883
 
 
   Here's some info on where I am with this.  I could really use some help to 
push this over the finish line.
   
   The expansion service runs correctly, sends the expanded transforms back to 
python, but the job fails inside Java on Flink because it's trying to use the 
incorrect serializer.  There's a good chance that I'm overlooking something 
very obvious.
   
   Here's the stack trace:
   
   ```
   2019-08-16 16:04:58,297 INFO  org.apache.flink.runtime.taskmanager.Task  
   - Source: 
PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource)
 (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED.
   java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage 
cannot be cast to [B
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at 
org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.ClassCastException: 
org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B
at 
org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
at 
org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46)
at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 15 more
   ```
   
   Here's the serializer that `CoderTypeSerialize

[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296658
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:39
Start Date: 16/Aug/19 23:39
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883
 
 
   Here's some info on where I am with this.  I could really use some help to 
push this over the finish line.
   
   The expansion service runs correctly, sends the expanded transforms back to 
python, but the job fails inside Java because it's trying to use the incorrect 
serializer.  There's a good chance that I'm overlooking something very obvious.
   
   Here's the stack trace:
   
   ```
   2019-08-16 16:04:58,297 INFO  org.apache.flink.runtime.taskmanager.Task  
   - Source: 
PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource)
 (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED.
   java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage 
cannot be cast to [B
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342)
at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at 
org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.ClassCastException: 
org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B
at 
org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
at 
org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
at 
org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175)
at 
org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46)
at 
org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)
at 
org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 15 more
   ```
   
   Here's the serializer that `CoderTypeSerializer.serializ

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=296657&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296657
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:37
Start Date: 16/Aug/19 23:37
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296657)
Time Spent: 48h 50m  (was: 48h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 48h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296654
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:34
Start Date: 16/Aug/19 23:34
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522181291
 
 
   run python 3.5 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296654)
Time Spent: 11h 40m  (was: 11.5h)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=296650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296650
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:25
Start Date: 16/Aug/19 23:25
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9333: [BEAM-5820] release 
vendor calcite
URL: https://github.com/apache/beam/pull/9333#issuecomment-522180056
 
 
   Thanks Luke! Could you help with releasing the 0.1 version? Are there some 
instructions I can help with?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296650)
Time Spent: 4h 50m  (was: 4h 40m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296649
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:24
Start Date: 16/Aug/19 23:24
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522179920
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296649)
Time Spent: 11.5h  (was: 11h 20m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296648&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296648
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:18
Start Date: 16/Aug/19 23:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314917601
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296648)
Time Spent: 3.5h  (was: 3h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296645
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:17
Start Date: 16/Aug/19 23:17
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314917511
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
 
 Review comment:
   Good suggestion, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296645)
Time Spent: 3h 10m  (was: 3h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296646
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:17
Start Date: 16/Aug/19 23:17
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314917564
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
+// encoded with beam:coder:bytes:v1. If there are no nulls an empty 
byte
+// array is encoded.
+//   - An encoding for each non-null field, concatenated together.
+//
+// Schema types are mapped to coders as follows:
+//   AtomicType:
+// BYTE:  not yet a standard coder
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296646)
Time Spent: 3h 20m  (was: 3h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7888) Test Multi Process Direct Runner With Largish Data

2019-08-16 Thread Hannah Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-7888:
---
Issue Type: Task  (was: Test)

> Test Multi Process Direct Runner With Largish Data
> --
>
> Key: BEAM-7888
> URL: https://issues.apache.org/jira/browse/BEAM-7888
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Major
>
> Filing this as a tracker.
> We can test multiprocess runner with a largish amount of data to the extend 
> that we can do this on Jenkins. This will serve 2 purposes:
> - Find out issues related to multi processing. It would be easier to find 
> rare issues when running over non-trivial data.
> - Serve as a baseline (if not a benchmark) to understand the limits of the 
> multiprocess runner.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7888) Test Multi Process Direct Runner With Largish Data

2019-08-16 Thread Hannah Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-7888:
---
Issue Type: Test  (was: Sub-task)
Parent: (was: BEAM-3645)

> Test Multi Process Direct Runner With Largish Data
> --
>
> Key: BEAM-7888
> URL: https://issues.apache.org/jira/browse/BEAM-7888
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Hannah Jiang
>Priority: Major
>
> Filing this as a tracker.
> We can test multiprocess runner with a largish amount of data to the extend 
> that we can do this on Jenkins. This will serve 2 purposes:
> - Find out issues related to multi processing. It would be easier to find 
> rare issues when running over non-trivial data.
> - Serve as a baseline (if not a benchmark) to understand the limits of the 
> multiprocess runner.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7997) Create a single docker container for a pipeline instead of creating one for each worker.

2019-08-16 Thread Hannah Jiang (JIRA)
Hannah Jiang created BEAM-7997:
--

 Summary: Create a single docker container for a pipeline instead 
of creating one for each worker.
 Key: BEAM-7997
 URL: https://issues.apache.org/jira/browse/BEAM-7997
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Hannah Jiang
Assignee: Hannah Jiang


With --direct_nun_workers option, current implementation creates a container 
for each worker. We should change this behavior to creating a container for a 
pipeline and handles multi workers inside of the container.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7462) Add Sampled Byte Count Metric to the Java SDK

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7462?focusedWorklogId=296641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296641
 ]

ASF GitHub Bot logged work on BEAM-7462:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:07
Start Date: 16/Aug/19 23:07
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #8809: [WIP] 
[BEAM-7462] Update java SDK to report the SampledByteCount counter.
URL: https://github.com/apache/beam/pull/8809
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296641)
Time Spent: 1h  (was: 50m)

> Add Sampled Byte Count Metric to the Java SDK
> -
>
> Key: BEAM-7462
> URL: https://issues.apache.org/jira/browse/BEAM-7462
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portable-metrics-bugs
> Attachments: bundle_descriptor_dump.txt
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7462) Add Sampled Byte Count Metric to the Java SDK

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7462?focusedWorklogId=296640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296640
 ]

ASF GitHub Bot logged work on BEAM-7462:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:07
Start Date: 16/Aug/19 23:07
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8809: [WIP] [BEAM-7462] 
Update java SDK to report the SampledByteCount counter.
URL: https://github.com/apache/beam/pull/8809#issuecomment-522177266
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296640)
Time Spent: 50m  (was: 40m)

> Add Sampled Byte Count Metric to the Java SDK
> -
>
> Key: BEAM-7462
> URL: https://issues.apache.org/jira/browse/BEAM-7462
> Project: Beam
>  Issue Type: New Feature
>  Components: java-fn-execution
>Reporter: Alex Amato
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portable-metrics-bugs
> Attachments: bundle_descriptor_dump.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7997) Create a single docker container for a pipeline instead of creating one for each worker.

2019-08-16 Thread Hannah Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-7997:
---
Status: Open  (was: Triage Needed)

> Create a single docker container for a pipeline instead of creating one for 
> each worker.
> 
>
> Key: BEAM-7997
> URL: https://issues.apache.org/jira/browse/BEAM-7997
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: portability
>
> With --direct_nun_workers option, current implementation creates a container 
> for each worker. We should change this behavior to creating a container for a 
> pipeline and handles multi workers inside of the container.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7981) ParDo function wrapper doesn't support Iterable output types

2019-08-16 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-7981:
---

Assignee: Udi Meiri

> ParDo function wrapper doesn't support Iterable output types
> 
>
> Key: BEAM-7981
> URL: https://issues.apache.org/jira/browse/BEAM-7981
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> I believe the bug is in CallableWrapperDoFn.default_type_hints, which 
> converts Iterable[str] to str.
> This test will be included (commented out) in 
> https://github.com/apache/beam/pull/9283
> {code}
>   def test_typed_callable_iterable_output(self):
> @typehints.with_input_types(int)
> @typehints.with_output_types(typehints.Iterable[str])
> def do_fn(element):
>   return [[str(element)] * 2]
> result = [1, 2] | beam.ParDo(do_fn)
> self.assertEqual([['1', '1'], ['2', '2']], sorted(result))
> {code}
> Result:
> {code}
> ==
> ERROR: test_typed_callable_iterable_output 
> (apache_beam.typehints.typed_pipeline_test.MainInputTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py",
>  line 104, in test_typed_callable_iterable_output
> result = [1, 2] | beam.ParDo(do_fn)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 519, in __ror__
> p.run().wait_until_finish()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", 
> line 406, in run
> self._options).run(False)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", 
> line 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 129, in run_pipeline
> return runner.run_pipeline(pipeline, options)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 366, in run_pipeline
> default_environment=self._default_environment))
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 373, in run_via_runner_api
> return self.run_stages(stage_context, stages)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 455, in run_stages
> stage_context.safe_coders)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 733, in _run_stage
> result, splits = bundle_manager.process_bundle(data_input, data_output)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1663, in process_bundle
> part, expected_outputs), part_inputs):
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
> result_iterator
> yield fs.pop().result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
> return self.__get_result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
> __get_result
> raise self._exception
>   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1663, in 
> part, expected_outputs), part_inputs):
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1601, in process_bundle
> result_future = self._worker_handler.control_conn.push(process_bundle_req)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1080, in push
> response = self.worker.do_instruction(request)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 343, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
>  line 369, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle
> data.ptransform_id].process_encoded(data.data)
>   File 
> "/usr/local/google/home/ehudm/src/beam/

[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296637
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:41
Start Date: 16/Aug/19 22:41
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9365: [BEAM-7924] fix 
pipeline options error with main session
URL: https://github.com/apache/beam/pull/9365#issuecomment-522172714
 
 
   It turns out the issue was because I had saved `pipeline_options` as a 
global variable in my main session, and that was shadowing the 
`pipeline_options` module in the SDK worker.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296637)
Time Spent: 4h  (was: 3h 50m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296636
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:37
Start Date: 16/Aug/19 22:37
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314909460
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
 
 Review comment:
   Yeah, the name here seems wrong.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296636)
Time Spent: 1h 20m  (was: 1h 10m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296632
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:36
Start Date: 16/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314909234
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions.go
 ##
 @@ -40,6 +40,8 @@ func FunctionName(fn interface{}) string {
 // points to a valid function implementation.
 func LoadFunction(ptr uintptr, t reflect.Type) interface{} {
v := reflect.New(t).Elem()
-   *(*uintptr)(unsafe.Pointer(v.Addr().Pointer())) = 
(uintptr)(unsafe.Pointer(&ptr))
+   p := new(uintptr)
+   *p = ptr
+   *(*unsafe.Pointer)(unsafe.Pointer(v.Addr().Pointer())) = 
unsafe.Pointer(p)
 
 Review comment:
   I had to read up a bit on how reflect and unsafe work to understand what was 
happening here, but I still don't understand exactly why the previous version 
was invalid and this one valid. (I'm using 
https://golang.org/pkg/unsafe/#Pointer as a guideline).
   
   Is it because in the old version we simply put in the address to `ptr`, and 
in the new one we actually create a new copy?
   
   Or is it because in the old version we're reassigning to `v` as a `uintptr` 
instead of as an `unsafe.Pointer`?
   
   Those are the only real differences I could figure out.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296632)
Time Spent: 50m  (was: 40m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296635
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:36
Start Date: 16/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314910880
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
+   val := reflect.ValueOf(testFunction)
+   fi := uintptr(val.Pointer())
+   typ := val.Type()
+
+   callable := LoadFunction(fi, typ)
+
+   cv := reflect.ValueOf(callable)
+   out := cv.Call(nil)
+   if len(out) != 1 {
+   t.Errorf("got %d return values, wanted 1.", len(out))
+   }
+   // TODO: check type?
+   if out[0].Int() != 42 {
+   t.Errorf("got %d, wanted 42", out[0].Int())
+   }
 
 Review comment:
   Nit: Instead of comparing to 42 directly, could you do something like:
   
   ```
   expected := testFunction()
   if out[0].Int() != expected {
t.Errorf("got %d, wanted %d", out[0].Int(), expected)
   }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296635)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296634
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:36
Start Date: 16/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314909460
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
 
 Review comment:
   Yeah, the name here seems wrong.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296634)
Time Spent: 1h 10m  (was: 1h)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296633
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:36
Start Date: 16/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314910919
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
 
 Review comment:
   +1, name seems off.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296633)
Time Spent: 1h  (was: 50m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7996) Add support for remaining data types in python RowCoder

2019-08-16 Thread Brian Hulette (JIRA)
Brian Hulette created BEAM-7996:
---

 Summary: Add support for remaining data types in python RowCoder 
 Key: BEAM-7996
 URL: https://issues.apache.org/jira/browse/BEAM-7996
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py-core
Reporter: Brian Hulette


In the initial [python RowCoder 
implementation|https://github.com/apache/beam/pull/9188] we only added support 
for the data types that already had coders in the Python SDK. We should add 
support for the remaining data types that are not currently supported:

* INT8 (ByteCoder in Java)
* INT16 (BigEndianShortCoder in Java)
* FLOAT (FloatCoder in Java)
* BOOLEAN (BooleanCoder in Java)
* Map (MapCoder in Java)

We might consider making those coders standard so they can be tested 
independently from RowCoder in standard_coders.yaml. Or, if we don't do that we 
should probably add a more robust testing framework for RowCoder itself, 
because it will be challenging to test all of these types as part of the 
RowCoder tests in standard_coders.yaml.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296624
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 16/Aug/19 22:08
Start Date: 16/Aug/19 22:08
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9365: [BEAM-7924] fix 
pipeline options error with main session
URL: https://github.com/apache/beam/pull/9365#issuecomment-522166406
 
 
   I made this change to fix a future test, which I have not uploaded yet.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296624)
Time Spent: 3h 50m  (was: 3h 40m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296619
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:55
Start Date: 16/Aug/19 21:55
Worklog Time Spent: 10m 
  Work Description: dsnet commented on issue #9362: [BEAM-7994] Fixing 
unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#issuecomment-522163621
 
 
   More background can be found here: golang/go#33662
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296619)
Time Spent: 40m  (was: 0.5h)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296620
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:55
Start Date: 16/Aug/19 21:55
Worklog Time Spent: 10m 
  Work Description: lhaiesp commented on pull request #9364: BEAM-7995: 
python PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#discussion_r314902937
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator):
 if windows is 0:
   self.output(_globally_windowed_value.with_value((key, value)))
 else:
-  self.output(WindowedValue((key, value), windows[0].end, windows))
+  self.output(WindowedValue((key, value), windows[0].max_timestamp(), 
windows))
 
 Review comment:
   Why is it any more incorrect than "end"?  max_timestamp = end - 0.001
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296620)
Time Spent: 40m  (was: 0.5h)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK operation: 
> [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/

[jira] [Commented] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-16 Thread Kyle Weaver (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909435#comment-16909435
 ] 

Kyle Weaver commented on BEAM-7924:
---

Update: I've identified the issue; see [GitHub Pull Request 
#9365|https://github.com/apache/beam/pull/9365]

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296617
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:52
Start Date: 16/Aug/19 21:52
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9365: [BEAM-7924] fix 
pipeline options error with main session
URL: https://github.com/apache/beam/pull/9365
 
 
   When the `save_main_session` option is enabled, the Python SDK harness 
currently fails with the following message:
   
   ```
   AttributeError: 'PipelineOptions' object has no attribute 'ProfilingOptions'
   ```
   
   I'm not sure exactly what's going on here, so it'd be great if anyone could 
help explain why this bandaid fix works, and if we might need a deeper fix.
   
   R: @robertwb @aaltay 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/b

[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296614
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:49
Start Date: 16/Aug/19 21:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9364: BEAM-7995: 
python PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#discussion_r314901491
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator):
 if windows is 0:
   self.output(_globally_windowed_value.with_value((key, value)))
 else:
-  self.output(WindowedValue((key, value), windows[0].end, windows))
+  self.output(WindowedValue((key, value), windows[0].max_timestamp(), 
windows))
 
 Review comment:
   Would not this make the timestamp for the element incorrect?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296614)
Time Spent: 0.5h  (was: 20m)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK operation: 
> [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/w

[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296609
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:46
Start Date: 16/Aug/19 21:46
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522161595
 
 
   run python 3.7 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296609)
Time Spent: 11h 20m  (was: 11h 10m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296608
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:46
Start Date: 16/Aug/19 21:46
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522161571
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296608)
Time Spent: 11h 10m  (was: 11h)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296604
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:42
Start Date: 16/Aug/19 21:42
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9232: [BEAM-7819] Python - 
parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#issuecomment-522160484
 
 
   > Is that external to the Beam project?
   
   Yes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296604)
Time Spent: 6h 50m  (was: 6h 40m)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296597
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:32
Start Date: 16/Aug/19 21:32
Worklog Time Spent: 10m 
  Work Description: dsnet commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#discussion_r314897093
 
 

 ##
 File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go
 ##
 @@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package reflectx
+
+import (
+   "reflect"
+   "testing"
+)
+
+func testFunction() int {
+   return 42
+}
+
+func TestXxx(t *testing.T) {
 
 Review comment:
   `TestLoadFunction`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296597)
Time Spent: 0.5h  (was: 20m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296594
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:23
Start Date: 16/Aug/19 21:23
Worklog Time Spent: 10m 
  Work Description: lhaiesp commented on issue #9364: BEAM-7995: python 
PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#issuecomment-522155765
 
 
   @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296594)
Time Spent: 20m  (was: 10m)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK operation: 
> [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889]
>  
>     if windows is 0:
>       self.output(_globally_windowed_value.with_value((key, value)))
>     else:
>       self.output(WindowedValue((key, value), *windows[0].end*, windows))
>  
> Here when we generate the window value, the timestamp is assigned to the 
> closed interval end (45.000) as opposed to open interval end (44.999)
>  
> Clearly the "end of window" definition is a bit inconsistent across Python 
> and Java. I'm yet to try this on other run

[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296584
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 16/Aug/19 21:13
Start Date: 16/Aug/19 21:13
Worklog Time Spent: 10m 
  Work Description: lhaiesp commented on pull request #9364: BEAM-7995: 
python PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364
 
 
   python PGBKCVOperation is using closed interval end which is not acceptable 
for window calculation. Make a change to use window.max_timestamp() which uses 
open interval end for the window
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Updated] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread Hai Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hai Lu updated BEAM-7995:
-
Description: 
I'm looking into a bug I found internally when using Beam portable API (Python) 
on our own Samza runner. 
 
The pipeline looks something like this:
 
    (p
     | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
     | 'transform' >> beam.Map(lambda event: process_event(event))
     | 'window' >> beam.WindowInto(FixedWindows(15))
     | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
     ...
 
The problem comes from the combiners which cause the following exception on 
Java side:
 
Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z for 
window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
    at org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
    at 
org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
    at 
org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
    at 
org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
    at 
org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
    at 
org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
 
The exception happens here 
[https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
 when we check the shifted timestamp to ensure it's before the timestamp.
 
    if (shifted.isBefore(timestamp)) {
      throw new IllegalStateException(
          String.format(
              "TimestampCombiner moved element from %s to earlier time %s for 
window %s",
              BoundedWindow.formatTimestamp(timestamp),
              BoundedWindow.formatTimestamp(shifted),
              window));
    }
 
As you can see from the exception, the "shifted" is "XXX 44.999" while the 
"timestamp" is "XXX 45.000". The "44.999" is coming from 
[TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
 
    @Override
    public Instant merge(BoundedWindow intoWindow, Iterable 
mergingTimestamps) {
      return intoWindow.maxTimestamp();
    }
 
where intoWindow.maxTimestamp() is:
 
  /** Returns the largest timestamp that can be included in this window. */
  @Override
  public Instant maxTimestamp() {
    *// end not inclusive*
    return *end.minus(1)*;
  }
 
Hence, the "44.*999*". 
 
And the "45.000" comes from the Python side when the combiner output results as 
pre GBK operation: 
[operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889]
 
    if windows is 0:
      self.output(_globally_windowed_value.with_value((key, value)))
    else:
      self.output(WindowedValue((key, value), *windows[0].end*, windows))
 
Here when we generate the window value, the timestamp is assigned to the closed 
interval end (45.000) as opposed to open interval end (44.999)
 
Clearly the "end of window" definition is a bit inconsistent across Python and 
Java. I'm yet to try this on other runner so not sure whether this is only an 
issue for our Samza runner. I tend to think this is a bug but would like to 
confirm with you. If this has not been an issue for other runners, where did I 
potentially do wrong.

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.

[jira] [Created] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-16 Thread Hai Lu (JIRA)
Hai Lu created BEAM-7995:


 Summary: IllegalStateException: TimestampCombiner moved element 
from to earlier time in Python
 Key: BEAM-7995
 URL: https://issues.apache.org/jira/browse/BEAM-7995
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Hai Lu
Assignee: Hai Lu






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=296566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296566
 ]

ASF GitHub Bot logged work on BEAM-7049:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:51
Start Date: 16/Aug/19 20:51
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9358: 
(WIP-BEAM-7049)Changes made to make a simple case of threeway union work
URL: https://github.com/apache/beam/pull/9358#discussion_r314885625
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java
 ##
 @@ -88,15 +92,20 @@ public BeamSetOperatorRelBase(BeamRelNode beamRelNode, 
OpType opType, boolean al
 leftRows.apply(
 "CreateLeftIndex",
 MapElements.via(new 
BeamSetOperatorsTransforms.BeamSqlRow2KvFn(
-.and(
-rightTag,
-rightRows.apply(
-"CreateRightIndex",
-MapElements.via(new 
BeamSetOperatorsTransforms.BeamSqlRow2KvFn(
-.apply(CoGroupByKey.create());
+.and(
 
 Review comment:
   note that to make it general, it would be a for loop to create one tag per 
input.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296566)
Time Spent: 20m  (was: 10m)

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296561
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:40
Start Date: 16/Aug/19 20:40
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on issue #9232: [BEAM-7819] 
Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#issuecomment-522144491
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296561)
Time Spent: 6h 40m  (was: 6.5h)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?focusedWorklogId=296560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296560
 ]

ASF GitHub Bot logged work on BEAM-7989:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:40
Start Date: 16/Aug/19 20:40
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9357: [BEAM-7989] 
Remove side inputs from CacheVisitor calculation
URL: https://github.com/apache/beam/pull/9357
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296560)
Time Spent: 1h 10m  (was: 1h)

> SparkRunner CacheVisitor counts PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs

2019-08-16 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-7989.

   Resolution: Fixed
Fix Version/s: 2.16.0

> SparkRunner CacheVisitor counts PCollections from SideInputs
> 
>
> Key: BEAM-7989
> URL: https://issues.apache.org/jira/browse/BEAM-7989
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.14.0
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The SparkRunner's CacheVisitor looks at all inputs for a 
> TransformHierarchy.Node. Those inputs include the PCollections from the 
> PCollectionViews that are supplied as sideInputs.
> The SparkRunner should not count these instances of sideInputs as the 
> PCollections are not actually accessed. They are only accessed when the 
> CreatePCollectionView Transform is processed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296559
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:39
Start Date: 16/Aug/19 20:39
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on issue #9232: [BEAM-7819] 
Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#issuecomment-522144229
 
 
   > FYI, support for message_id and publish_time in Dataflow should be 
available in a future update of Dataflow. Until then, those fields will appear 
unset or blank.
   
   Is that external to the Beam project?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296559)
Time Spent: 6.5h  (was: 6h 20m)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7802) Expose a method to make an Schema coder from an Avro coder

2019-08-16 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-7802.

   Resolution: Fixed
Fix Version/s: 2.16.0

> Expose a method to make an Schema coder from an Avro coder
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method to easily get a Schema Coder from 
> Avro objects or Schema and the user finally can set the Schema-based coder if 
> it needs the PCollection to behavie like a Schema-based one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7802) Expose a method to make an Schema coder from an Avro coder

2019-08-16 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7802:
---
Description: Avro can infer the Schema for an Avro based PCollection by 
using the `withBeamSchemas` method, however if the user created a PCollection 
with Avro objects or IndexedRecord/GenericRecord, he needs to manually set the 
schema (or coder). The idea is to expose a method to easily get a Schema Coder 
from Avro objects or Schema and the user finally can set the Schema-based coder 
if it needs the PCollection to behavie like a Schema-based one.  (was: Avro can 
infer the Schema for an Avro based PCollection by using the `withBeamSchemas` 
method, however if the user created a PCollection with Avro objects or 
IndexedRecord/GenericRecord, he needs to manually set the schema (or coder). 
The idea is to expose a method in schema.AvroUtils to ease this.)

> Expose a method to make an Schema coder from an Avro coder
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method to easily get a Schema Coder from 
> Avro objects or Schema and the user finally can set the Schema-based coder if 
> it needs the PCollection to behavie like a Schema-based one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7802) Expose a method to make an Schema coder from an Avro coder

2019-08-16 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7802:
---
Summary: Expose a method to make an Schema coder from an Avro coder  (was: 
Expose a method to make an Avro-based PCollection into an Schema-based one)

> Expose a method to make an Schema coder from an Avro coder
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296551
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:31
Start Date: 16/Aug/19 20:31
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296551)
Time Spent: 6.5h  (was: 6h 20m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296550&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296550
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:30
Start Date: 16/Aug/19 20:30
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-522141914
 
 
   Merging since it is green now. Thanks for the review @kanterov and 
@RyanSkraba 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296550)
Time Spent: 6h 20m  (was: 6h 10m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296549
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:30
Start Date: 16/Aug/19 20:30
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-522124180
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296549)
Time Spent: 6h 10m  (was: 6h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296546
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:25
Start Date: 16/Aug/19 20:25
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on issue #9362: [BEAM-7994] Fixing 
unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362#issuecomment-522140404
 
 
   R: @lostluck @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296546)
Time Spent: 20m  (was: 10m)

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296544
 ]

ASF GitHub Bot logged work on BEAM-7994:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:24
Start Date: 16/Aug/19 20:24
Worklog Time Spent: 10m 
  Work Description: wcn3 commented on pull request #9362: [BEAM-7994] 
Fixing unsafe pointer usage for Go 1.13
URL: https://github.com/apache/beam/pull/9362
 
 
   The Go team identified a problem in the Beam SDK that appears due to runtime 
changes in Go 1.13, which is upcoming. This is the backwards compatible fix the 
team recommended.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296544)
Time Spent: 10m
Remaining Estimate: 0h

> BEAM SDK has compatibility problems with go1.13
> ---
>
> Key: BEAM-7994
> URL: https://issues.apache.org/jira/browse/BEAM-7994
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Bill Neubauer
>Assignee: Bill Neubauer
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Go team identified a problem in the Beam SDK that appears due to runtime 
> changes in Go 1.13, which is upcoming. There is a backwards compatible fix 
> the team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7994) BEAM SDK has compatibility problems with go1.13

2019-08-16 Thread Bill Neubauer (JIRA)
Bill Neubauer created BEAM-7994:
---

 Summary: BEAM SDK has compatibility problems with go1.13
 Key: BEAM-7994
 URL: https://issues.apache.org/jira/browse/BEAM-7994
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Reporter: Bill Neubauer
Assignee: Bill Neubauer


The Go team identified a problem in the Beam SDK that appears due to runtime 
changes in Go 1.13, which is upcoming. There is a backwards compatible fix the 
team recommended.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296536
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 20:17
Start Date: 16/Aug/19 20:17
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9232: [BEAM-7819] Python - 
parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#issuecomment-522138174
 
 
   FYI, support for message_id and publish_time in Dataflow should be available 
in a future update of Dataflow. Until then, those fields will appear unset or 
blank.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296536)
Time Spent: 6h 20m  (was: 6h 10m)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7993) portable python precommit is flaky

2019-08-16 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909362#comment-16909362
 ] 

Udi Meiri commented on BEAM-7993:
-

Similar failures:
https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5505/consoleFull
https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5508/console

[~tvalentyn], [~robertwb]: Could either of you forward this to someone for 
triage?

> portable python precommit is flaky
> --
>
> Key: BEAM-7993
> URL: https://issues.apache.org/jira/browse/BEAM-7993
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures, testing
>Reporter: Udi Meiri
>Priority: Major
>
> I'm not sure what the root cause is here.
> Example log where 
> :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an 
> exception: java.io.IOException: Received exit code 1 for command 'docker 
> inspect -f {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22  at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22  at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22  at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.io.IOException: Received exit code 1 for command 'docker inspect -f 
> {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22  at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22  at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211)
> 11:51:22  at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202)
> 11:51:22  at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22  at 
> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22  at 
> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22  at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22  at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22  ... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7993) portable python precommit is flaky

2019-08-16 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-7993:
---

 Summary: portable python precommit is flaky
 Key: BEAM-7993
 URL: https://issues.apache.org/jira/browse/BEAM-7993
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, test-failures, testing
Reporter: Udi Meiri


I'm not sure what the root cause is here.

Example log where :sdks:python:test-suites:portable:py35:portableWordCountBatch 
failed:
{code}
11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
(FlatMap at ExtractOutput[0]) (2/2)] ERROR 
org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
ExtractOutput[0]) (2/2)
11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
(FlatMap at ExtractOutput[0]) (1/2)] ERROR 
org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
ExtractOutput[0]) (1/2)
11:51:22 [CHAIN MapPartition (MapPartition at 
[2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
11:51:22 [CHAIN MapPartition (MapPartition at 
[2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
11:51:22 java.lang.Exception: The user defined 'open()' method caused an 
exception: java.io.IOException: Received exit code 1 for command 'docker 
inspect -f {{.State.Running}} 
642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
Error: No such object: 
642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
11:51:22at 
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
11:51:22at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
11:51:22at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
11:51:22at java.lang.Thread.run(Thread.java:748)
11:51:22 Caused by: 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
 java.io.IOException: Received exit code 1 for command 'docker inspect -f 
{{.State.Running}} 
642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
Error: No such object: 
642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
11:51:22at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
11:51:22at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211)
11:51:22at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202)
11:51:22at 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
11:51:22at 
org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
11:51:22at 
org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
11:51:22at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
11:51:22at 
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
11:51:22at 
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
11:51:22... 3 more
{code}

https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296532
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:56
Start Date: 16/Aug/19 19:56
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] 
Remove call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-522132527
 
 
   One other change that we can think about making now that we no longer need 
to preserve the partitioner. Just to clean things up and eliminate a possible 
source of confusion:
   ```
   @@ -58,14 +58,9 @@ public class GroupCombineFunctions {
JavaPairRDD> groupedRDD =
(partitioner != null) ? pairRDD.groupByKey(partitioner) : 
pairRDD.groupByKey();

   -// using mapPartitions allows to preserve the partitioner
   -// and avoid unnecessary shuffle downstream.
return groupedRDD
   -.mapPartitionsToPair(
   -TranslationUtils.pairFunctionToPairFlatMapFunction(
   -CoderHelpers.fromByteFunctionIterable(keyCoder, wvCoder)),
   -true)
   -.mapPartitions(TranslationUtils.fromPairFlatMapFunction(), true);
   +.mapToPair(CoderHelpers.fromByteFunctionIterable(keyCoder, wvCoder))
   +.map(new TranslationUtils.FromPairFunction<>());
  }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296532)
Time Spent: 6h 10m  (was: 6h)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.16.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296530
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:52
Start Date: 16/Aug/19 19:52
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522131428
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296530)
Time Spent: 11h  (was: 10h 50m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296529
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:50
Start Date: 16/Aug/19 19:50
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-522130866
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296529)
Time Spent: 10h 50m  (was: 10h 40m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7938) ProtoCoder throws NoSuchMethodException: com.google.protobuf.Message.getDefaultInstance()

2019-08-16 Thread Martin Fahy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909352#comment-16909352
 ] 

Martin Fahy commented on BEAM-7938:
---

Hi [~paliendroom] is there a way you could provide an example snippet of code 
which reproduces the issue. Thank you!

> ProtoCoder  throws NoSuchMethodException: 
> com.google.protobuf.Message.getDefaultInstance()
> -
>
> Key: BEAM-7938
> URL: https://issues.apache.org/jira/browse/BEAM-7938
> Project: Beam
>  Issue Type: Bug
>  Components: extensions-java-protobuf, io-java-files, io-java-gcp, 
> io-java-parquet
>Affects Versions: 2.14.0
>Reporter: Lien Michiels
>Priority: Major
>  Labels: newbie, starter
>
> h3. Context
> I have a beam pipeline running on DataFlow using the Java SDK that pulls 
> Proto wrapper messages from a PubSub subscription, I partition these by the 
> OneOf-value and then apply a MapElements to extract the underlying Proto 
> message, so that I end up with a PCollectionList. I then 
> do some more processing and try to write them to different sinks. BigQueryIO 
> works absolutely fine. However when I try to use the PubsubIO or ParquetIO, I 
> end up with this error when using FileIO (for Parquet): 
>  
> {code:java}
> java.lang.IllegalArgumentException: java.lang.NoSuchMethodException: 
> com.google.protobuf.Message.getDefaultInstance() 
> org.apache.beam.sdk.extensions.protobuf.ProtoCoder.getParser(ProtoCoder.java:288)
>  
> org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:192)
>  
> org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:108)
>  
> org.apache.beam.runners.dataflow.worker.WindmillKeyedWorkItem.lambda$elementsIterable$2(WindmillKeyedWorkItem.java:107)
>  
> org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterators$7.transform(Iterators.java:750)
>  
> org.apache.beam.vendor.guava.v20_0.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
>  java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) 
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>  
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>  
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>  
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>  
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
>  
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner$LateDataFilter.filter(LateDataDroppingDoFnRunner.java:128)
>  
> org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:76)
>  
> org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1287)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
>  
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  java.base/java.lang.Thread.run(Thread.java:834)
> {code}
>  
> and this for PubsubIO:
>  
> {code:java}
> java.lang.IllegalArgumentException: java.lang.NoSuchMethodException: 
> com.google.protobuf.Message.getDefaultInstance() 
> org.apache.beam.sdk.extensions.protobuf.ProtoCoder.getParser(ProtoCoder.java:288)
>  
> org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:192)
>  
> org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:108)
>  
> org.apache.beam.runners.dataflow.worker.WindmillKeyedWorkItem.lambda$elementsIterable$2(WindmillKeyed

[jira] [Commented] (BEAM-7164) Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-08-16 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909353#comment-16909353
 ] 

Udi Meiri commented on BEAM-7164:
-

Seems to not reproduce when adding --no-parallel to the setupVirtualenv tasks.
I guess the issue is with overloading files.pythonhosted.org with requests.

> Python precommit failing on  Java PRs. dataflow:setupVirtualenv
> ---
>
> Key: BEAM-7164
> URL: https://issues.apache.org/jira/browse/BEAM-7164
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Alex Amato
>Priority: Major
>
> *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull*
> *18:05:44* >
>  *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv*
> *18:05:44* New python executable in 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44*
>  Also creating executable in 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44*
>  Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* Running 
> virtualenv with interpreter /usr/bin/python2.7*18:05:44* DEPRECATION: Python 
> 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your 
> Python as Python 2.7 won't be maintained after that date. A future version of 
> pip will drop support for Python 2.7.*18:05:44* Collecting 
> tox==3.0.0*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl]
> *18:05:44* Collecting grpcio-tools==1.3.5*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl]
> *18:05:44* Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44*   Using 
> cached 
> [https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl]
> *18:05:44* Collecting six (from tox==3.0.0)*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl]
> *18:05:44* Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44*   Using 
> cached 
> [https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl]
> *18:05:44* Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl]
> *18:05:44* Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44*   
> Using cached 
> [https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl]
> *18:05:44* Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44*   
> Using cached 
> [https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl]
> *18:05:44* Collecting futures>=2.2.0; python_version < "3.2" (from 
> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44*   ERROR: Could not find a 
> version that satisfies the requirement futures>=2.2.0; python_version < "3.2" 
> (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)*18:05:44* 
> ERROR: No matching distribution found for futures>=2.2.0; python_version < 
> "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* >
>  *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv*
>  FAILED*18:05:46* 
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296516
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:26
Start Date: 16/Aug/19 19:26
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-522124180
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296516)
Time Spent: 5h 50m  (was: 5h 40m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296515
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:26
Start Date: 16/Aug/19 19:26
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-522124042
 
 
   Run Java PreCommit
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296515)
Time Spent: 5h 40m  (was: 5.5h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296517
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:26
Start Date: 16/Aug/19 19:26
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-522124042
 
 
   Run Java PreCommit
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296517)
Time Spent: 6h  (was: 5h 50m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-1009) Upgrade from mockito-all 1 to mockito-core 2

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1009?focusedWorklogId=296505&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296505
 ]

ASF GitHub Bot logged work on BEAM-1009:


Author: ASF GitHub Bot
Created on: 16/Aug/19 19:02
Start Date: 16/Aug/19 19:02
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9145: [BEAM-1009] 
Update multiple modules to use Mockito 2
URL: https://github.com/apache/beam/pull/9145
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296505)
Time Spent: 2h 10m  (was: 2h)

> Upgrade from mockito-all 1 to mockito-core 2
> 
>
> Key: BEAM-1009
> URL: https://issues.apache.org/jira/browse/BEAM-1009
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Ismaël Mejía
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Mockito 2 provides useful features, and the mockito-all module is no longer 
> generated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7410) Explore possibilities to lower in-use IP address quota footprint.

2019-08-16 Thread Alan Myrvold (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909323#comment-16909323
 ] 

Alan Myrvold commented on BEAM-7410:


https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/ 
is passing
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/ is 
passing

Removed the currently-failing label

> Explore possibilities to lower in-use IP address quota footprint.
> -
>
> Key: BEAM-7410
> URL: https://issues.apache.org/jira/browse/BEAM-7410
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Alan Myrvold
>Priority: Major
>
>  
> {noformat}
> java.lang.RuntimeException: Workflow failed. Causes: Project 
> apache-beam-testing has insufficient quota(s) to execute this workflow with 1 
> instances in region us-central1. Quota summary (required/available): 1/11743 
> instances, 1/170 CPUs, 250/278399 disk GB, 0/4046 SSD disk GB, 1/188 instance 
> groups, 1/188 managed instance groups, 1/126 instance templates, 1/0 in-use 
> IP addresses.{noformat}
>  
> Sample jobs:
> [https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/440/testReport/junit/org.apache.beam.sdk.testing/PAssertTest/testGlobalWindowContainsInAnyOrder/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/67/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testEmptyIterableSideInput/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296496&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296496
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:31
Start Date: 16/Aug/19 18:31
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on pull request #9232: 
[BEAM-7819] Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#discussion_r314840925
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub.py
 ##
 @@ -61,29 +61,43 @@ class PubsubMessage(object):
 attributes: (dict) Key-value map of str to str, containing both 
user-defined
   and service generated attributes (such as id_label and
   timestamp_attribute). May be None.
+message_id: (string) Message_Id as generated by PubSub
+publish_time: (timestamp) Published time of message as generated by PubSub
+  as per google.protobuf.timestamp_pb2.Timestamp
+
+N.B. message_id and publish_time are not yet populated by Windmill service
 
 Review comment:
   Ah ok.  WIll update.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296496)
Time Spent: 6h 10m  (was: 6h)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296495
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:30
Start Date: 16/Aug/19 18:30
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on pull request #9232: 
[BEAM-7819] Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#discussion_r314840820
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub_test.py
 ##
 @@ -82,25 +83,37 @@ def test_proto_conversion(self):
 self.assertEqual(m_converted.attributes, attributes)
 
   def test_eq(self):
-a = PubsubMessage(b'abc', {1: 2, 3: 4})
-b = PubsubMessage(b'abc', {1: 2, 3: 4})
-c = PubsubMessage(b'abc', {1: 2})
+publish_time_secs = 1520861821
 
 Review comment:
   Yes no problem; I'd copied and pasted this from another test where the 
seconds and nanos were used seperately; but it's not needed here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296495)
Time Spent: 6h  (was: 5h 50m)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7410) Explore possibilities to lower in-use IP address quota footprint.

2019-08-16 Thread Alan Myrvold (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Myrvold updated BEAM-7410:
---
Labels:   (was: currently-failing)

> Explore possibilities to lower in-use IP address quota footprint.
> -
>
> Key: BEAM-7410
> URL: https://issues.apache.org/jira/browse/BEAM-7410
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Alan Myrvold
>Priority: Major
>
>  
> {noformat}
> java.lang.RuntimeException: Workflow failed. Causes: Project 
> apache-beam-testing has insufficient quota(s) to execute this workflow with 1 
> instances in region us-central1. Quota summary (required/available): 1/11743 
> instances, 1/170 CPUs, 250/278399 disk GB, 0/4046 SSD disk GB, 1/188 instance 
> groups, 1/188 managed instance groups, 1/126 instance templates, 1/0 in-use 
> IP addresses.{noformat}
>  
> Sample jobs:
> [https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/440/testReport/junit/org.apache.beam.sdk.testing/PAssertTest/testGlobalWindowContainsInAnyOrder/]
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/67/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testEmptyIterableSideInput/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296494
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:30
Start Date: 16/Aug/19 18:30
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on pull request #9232: 
[BEAM-7819] Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#discussion_r314840661
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/pubsub_test.py
 ##
 @@ -82,25 +83,37 @@ def test_proto_conversion(self):
 self.assertEqual(m_converted.attributes, attributes)
 
   def test_eq(self):
-a = PubsubMessage(b'abc', {1: 2, 3: 4})
-b = PubsubMessage(b'abc', {1: 2, 3: 4})
-c = PubsubMessage(b'abc', {1: 2})
+publish_time_secs = 1520861821
+publish_time_nanos = 234567000
+publish_time = Timestamp(seconds=publish_time_secs,
+ nanos=publish_time_nanos)
+a = PubsubMessage(b'abc', {1: 2, 3: 4}, '1234567890', publish_time)
+b = PubsubMessage(b'abc', {1: 2, 3: 4}, '1234567890', publish_time)
+c = PubsubMessage(b'abc', {1: 2}, '1234567890', publish_time)
 
 Review comment:
   Yes, I will add to the hash and repr tests as well; a check to ensure that 
both a and b do not equal d/e with some differing message_id/publish_time data).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296494)
Time Spent: 5h 50m  (was: 5h 40m)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7992) Unhandled type_constraint in apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types

2019-08-16 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-7992:
---

 Summary: Unhandled type_constraint in 
apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
 Key: BEAM-7992
 URL: https://issues.apache.org/jira/browse/BEAM-7992
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


{code}
root: DEBUG: Unhandled type_constraint: Union[]
root: DEBUG: Unhandled type_constraint: Union[]
root: DEBUG: Unhandled type_constraint: Any
root: DEBUG: Unhandled type_constraint: Any
{code}
https://builds.apache.org/job/beam_PostCommit_Python37_PR/20/testReport/junit/apache_beam.io.gcp.bigquery_write_it_test/BigQueryWriteIntegrationTests/test_big_query_write_new_types/

These log entries are from opcode.py's _unpack_lists.
They might be pointing to a bug or missing feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296491&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296491
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:25
Start Date: 16/Aug/19 18:25
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on pull request #9232: 
[BEAM-7819] Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#discussion_r314838660
 
 

 ##
 File path: sdks/python/scripts/run_pylint.sh
 ##
 @@ -104,6 +104,7 @@ ISORT_EXCLUDED=(
   "model.py"
   "taxi.py"
   "process_tfma.py"
+  "pubsub_test.py"
 
 Review comment:
   It was insisting I place the protobuf import prior to the beam imports, but 
then if I did that, complaining about the grouping with the pubsub import.  
However, as mentioned above, I didn't realise you could nest imports in a 
try/catch, so this should resolve this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296491)
Time Spent: 5h 40m  (was: 5.5h)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296493
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:26
Start Date: 16/Aug/19 18:26
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] 
Remove call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-522106601
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296493)
Time Spent: 6h  (was: 5h 50m)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.16.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296492
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:26
Start Date: 16/Aug/19 18:26
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] 
Remove call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-522106526
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296492)
Time Spent: 5h 50m  (was: 5h 40m)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.16.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296490
 ]

ASF GitHub Bot logged work on BEAM-7819:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:24
Start Date: 16/Aug/19 18:24
Worklog Time Spent: 10m 
  Work Description: matt-darwin commented on pull request #9232: 
[BEAM-7819] Python - parse PubSub message_id into attributes property
URL: https://github.com/apache/beam/pull/9232#discussion_r314838329
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/tests/pubsub_matcher_test.py
 ##
 @@ -37,8 +37,15 @@
 except ImportError:
   pubsub = None
 
+try:
+  from google.protobuf.timestamp_pb2 import Timestamp
 
 Review comment:
   Ah I didn't realise you could have imports in the same try statement.  
Fixing this up, which should also sort out the isort issues below (will handle 
it in the same way on pubsub_tests.py)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296490)
Time Spent: 5.5h  (was: 5h 20m)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=296484&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296484
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:13
Start Date: 16/Aug/19 18:13
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9351: 
[BEAM-7909] support customized container for Python
URL: https://github.com/apache/beam/pull/9351#discussion_r314834051
 
 

 ##
 File path: sdks/python/container/base_image_requirements.txt
 ##
 @@ -67,7 +67,7 @@ pandas==0.23.4
 protorpc==0.11.1
 python-gflags==3.0.6
 setuptools<=39.1.0 # requirement for Tensorflow.
-tensorflow==1.11.0
+tensorflow==1.13.1
 
 Review comment:
   `pyyaml` and `tensorflow` version upgrade is because of py3.7. Previous 
versions don't work for py3.7.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296484)
Time Spent: 1h 50m  (was: 1h 40m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7478?focusedWorklogId=296482&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296482
 ]

ASF GitHub Bot logged work on BEAM-7478:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:07
Start Date: 16/Aug/19 18:07
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8775: [BEAM-7478] Detect 
class path from the "java.class.path" property
URL: https://github.com/apache/beam/pull/8775#issuecomment-522100855
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296482)
Time Spent: 1h 50m  (was: 1h 40m)

> Remote cluster submission from Flink Runner broken due to staging issues
> 
>
> Key: BEAM-7478
> URL: https://issues.apache.org/jira/browse/BEAM-7478
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Attachments: image-2019-08-07-15-36-35-001.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The usual way to submit pipelines with the Flink Runner is to build a fat jar 
> and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This 
> works fine.
> Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option 
> to specify a remote cluster. Upon submitting an example we get the following 
> at Flink's JobManager.
> {noformat}
> Caused by: java.lang.IllegalAccessError: class 
> sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its 
> superclass sun.reflect.SerializationConstructorAccessorImpl
>   at sun.misc.Unsafe.defineClass(Native Method)
>   at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)
>   at 
> sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)
>   at 
> sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340)
>   at 
> java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420)
>   at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
>   at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
>   at 
> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438)
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288)
>   at 
> org.apache.flink.runtime.jobgraph.InputFormatVertex.initia

[jira] [Work logged] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7478?focusedWorklogId=296483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296483
 ]

ASF GitHub Bot logged work on BEAM-7478:


Author: ASF GitHub Bot
Created on: 16/Aug/19 18:07
Start Date: 16/Aug/19 18:07
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #8775: [BEAM-7478] 
Detect class path from the "java.class.path" property
URL: https://github.com/apache/beam/pull/8775
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296483)
Time Spent: 2h  (was: 1h 50m)

> Remote cluster submission from Flink Runner broken due to staging issues
> 
>
> Key: BEAM-7478
> URL: https://issues.apache.org/jira/browse/BEAM-7478
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-java-core
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Attachments: image-2019-08-07-15-36-35-001.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The usual way to submit pipelines with the Flink Runner is to build a fat jar 
> and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This 
> works fine.
> Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option 
> to specify a remote cluster. Upon submitting an example we get the following 
> at Flink's JobManager.
> {noformat}
> Caused by: java.lang.IllegalAccessError: class 
> sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its 
> superclass sun.reflect.SerializationConstructorAccessorImpl
>   at sun.misc.Unsafe.defineClass(Native Method)
>   at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
>   at 
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)
>   at 
> sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112)
>   at 
> sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340)
>   at 
> java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420)
>   at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497)
>   at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
>   at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
>   at 
> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438)
>   at 
> org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288)
>   at 
> org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:63)
>   ... 32 more
> {noformat}
> It appears there is an issue with the staging via {{PipelineResources}}.



--
This message was sent by Atlassian JIR

[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296477
 ]

ASF GitHub Bot logged work on BEAM-5519:


Author: ASF GitHub Bot
Created on: 16/Aug/19 17:57
Start Date: 16/Aug/19 17:57
Worklog Time Spent: 10m 
  Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] 
Remove call to groupByKey in Spark Streaming.
URL: https://github.com/apache/beam/pull/6511#issuecomment-522097498
 
 
   @iemejia I cleaned this up. Mainly just threw out the part where I was 
getting rid of the partitioner in batch mode, because 
[BEAM-7413](https://issues.apache.org/jira/browse/BEAM-7413) made me realize 
that approach would cause other issues.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296477)
Time Spent: 5h 40m  (was: 5.5h)

> Spark Streaming Duplicated Encoding/Decoding Effort
> ---
>
> Key: BEAM-5519
> URL: https://issues.apache.org/jira/browse/BEAM-5519
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Winkelman
>Assignee: Kyle Winkelman
>Priority: Major
>  Labels: spark, spark-streaming
> Fix For: 2.16.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When using the SparkRunner in streaming mode. There is a call to groupByKey 
> followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this 
> used to cause 2 shuffles but it still causes 2 encode/decode cycles.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7164) Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-08-16 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909263#comment-16909263
 ] 

Udi Meiri commented on BEAM-7164:
-

Reproducible locally via:
{code}
for i in `seq 100`; do ../../gradlew clean --no-parallel && ../../gradlew 
setupVirtualenv || break; done
{code}

> Python precommit failing on  Java PRs. dataflow:setupVirtualenv
> ---
>
> Key: BEAM-7164
> URL: https://issues.apache.org/jira/browse/BEAM-7164
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Alex Amato
>Priority: Major
>
> *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull*
> *18:05:44* >
>  *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv*
> *18:05:44* New python executable in 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44*
>  Also creating executable in 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44*
>  Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* Running 
> virtualenv with interpreter /usr/bin/python2.7*18:05:44* DEPRECATION: Python 
> 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your 
> Python as Python 2.7 won't be maintained after that date. A future version of 
> pip will drop support for Python 2.7.*18:05:44* Collecting 
> tox==3.0.0*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl]
> *18:05:44* Collecting grpcio-tools==1.3.5*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl]
> *18:05:44* Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44*   Using 
> cached 
> [https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl]
> *18:05:44* Collecting six (from tox==3.0.0)*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl]
> *18:05:44* Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44*   Using 
> cached 
> [https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl]
> *18:05:44* Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44*   Using cached 
> [https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl]
> *18:05:44* Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44*   
> Using cached 
> [https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl]
> *18:05:44* Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44*   
> Using cached 
> [https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl]
> *18:05:44* Collecting futures>=2.2.0; python_version < "3.2" (from 
> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44*   ERROR: Could not find a 
> version that satisfies the requirement futures>=2.2.0; python_version < "3.2" 
> (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)*18:05:44* 
> ERROR: No matching distribution found for futures>=2.2.0; python_version < 
> "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* >
>  *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv*
>  FAILED*18:05:46* 
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem

2019-08-16 Thread Boyuan Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang closed BEAM-7987.
--
   Resolution: Fixed
Fix Version/s: 2.16.0

> WindowingWindmillReader should not start to read from an empty windmill 
> workitem
> 
>
> Key: BEAM-7987
> URL: https://issues.apache.org/jira/browse/BEAM-7987
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> WindowingWindmillReader should expect that a windmill workitem has either 
> timers or elements.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7986) Increase minimum grpcio required version

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7986?focusedWorklogId=296458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296458
 ]

ASF GitHub Bot logged work on BEAM-7986:


Author: ASF GitHub Bot
Created on: 16/Aug/19 17:43
Start Date: 16/Aug/19 17:43
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9356: [BEAM-7986] Upgrade 
grpcio
URL: https://github.com/apache/beam/pull/9356#issuecomment-522093105
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296458)
Time Spent: 0.5h  (was: 20m)

> Increase minimum grpcio required version
> 
>
> Key: BEAM-7986
> URL: https://issues.apache.org/jira/browse/BEAM-7986
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> According to this question, 1.11.0 is not new enough (1.22.0 reportedly 
> works), and we list the minimum as 1.8.
> https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498
> Affects DirectRunner Pub/Sub client.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=296457&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296457
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 16/Aug/19 17:42
Start Date: 16/Aug/19 17:42
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r314820458
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/HllCountTest.java
 ##
 @@ -0,0 +1,373 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.zetasketch;
+
+import com.google.zetasketch.HyperLogLogPlusPlus;
+import com.google.zetasketch.shaded.com.google.protobuf.ByteString;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline.PipelineExecutionException;
+import org.apache.beam.sdk.testing.NeedsRunner;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TypeDescriptor;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link HllCount}. */
+@RunWith(JUnit4.class)
+public class HllCountTest {
+
+  @Rule public final transient TestPipeline p = TestPipeline.create();
+  @Rule public transient ExpectedException thrown = ExpectedException.none();
+
+  // Integer
+  private static final List INTS1 = Arrays.asList(1, 2, 3, 3, 1, 4);
+  private static final byte[] INTS1_SKETCH;
+  private static final Long INTS1_ESTIMATE;
+
+  static {
+HyperLogLogPlusPlus hll = new 
HyperLogLogPlusPlus.Builder().buildForIntegers();
+INTS1.forEach(hll::add);
+INTS1_SKETCH = hll.serializeToByteArray();
+INTS1_ESTIMATE = hll.longResult();
+  }
+
+  private static final List INTS2 = Arrays.asList(3, 3, 3, 3);
+  private static final byte[] INTS2_SKETCH;
+  private static final Long INTS2_ESTIMATE;
+
+  static {
+HyperLogLogPlusPlus hll = new 
HyperLogLogPlusPlus.Builder().buildForIntegers();
+INTS2.forEach(hll::add);
+INTS2_SKETCH = hll.serializeToByteArray();
+INTS2_ESTIMATE = hll.longResult();
+  }
+
+  private static final byte[] INTS1_INTS2_SKETCH;
+
+  static {
+HyperLogLogPlusPlus hll = HyperLogLogPlusPlus.forProto(INTS1_SKETCH);
+hll.merge(INTS2_SKETCH);
+INTS1_INTS2_SKETCH = hll.serializeToByteArray();
+  }
+
+  // Long
+  private static final List LONGS = Collections.singletonList(1L);
+  private static final byte[] LONGS_SKETCH;
+
+  static {
+HyperLogLogPlusPlus hll = new 
HyperLogLogPlusPlus.Builder().buildForLongs();
+LONGS.forEach(hll::add);
+LONGS_SKETCH = hll.serializeToByteArray();
+  }
+
+  private static final byte[] LONGS_EMPTY_SKETCH;
+
+  static {
+HyperLogLogPlusPlus hll = new 
HyperLogLogPlusPlus.Builder().buildForLongs();
+LONGS_EMPTY_SKETCH = hll.serializeToByteArray();
+  }
+
+  // String
+  private static final List STRINGS = Arrays.asList("s1", "s2", "s1", 
"s2");
+  private static final byte[] STRINGS_SKETCH;
+
+  static {
+HyperLogLogPlusPlus hll = new 
HyperLogLogPlusPlus.Builder().buildForStrings();
+STRINGS.forEach(hll::add);
+STRINGS_SKETCH = hll.serializeToByteArray();
+  }
+
+  private static final int TEST_PRECISION = 20;
+  private static final byte[] STRINGS_SKETCH_TEST_PRECISION;
+
+  static {
+HyperLogLogPlusPlus hll =
+new 
HyperLogLogPlusPlus.Builder().normalPrecision(TEST_PRECISION).buildForStrings();
+STRINGS.forEach(hll::add);
+STRINGS_SKETCH_TEST_PRECISION = hll.serializeToByteArray();
+  }
+
+  // Bytes
+  private static final byte[] BYTES0 = {(byte) 0x1, (byte)

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=296455&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296455
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 16/Aug/19 17:42
Start Date: 16/Aug/19 17:42
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r314795281
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/HllCountInitFn.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.zetasketch;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import com.google.zetasketch.HyperLogLogPlusPlus;
+import com.google.zetasketch.shaded.com.google.protobuf.ByteString;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.CoderRegistry;
+import org.apache.beam.sdk.transforms.Combine;
+
+/**
+ * {@link Combine.CombineFn} for the {@link HllCount.Init} combiner.
+ *
+ * @param  type of input values to the function (Integer, Long, 
String, or byte[])
+ * @param  type of the HLL++ sketch to compute (Integer, Long, String, 
or ByteString)
+ */
+abstract class HllCountInitFn
+extends Combine.CombineFn, byte[]> {
+
+  // Would make this a final field. However, that not only requires adding an 
extra type enum to
 
 Review comment:
   Nit: I would change the first sentence to "Ideally, this would be a final 
field set at construction time via the builder." just to be super-clear what is 
meant. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296455)
Time Spent: 20h 20m  (was: 20h 10m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >