[jira] [Created] (BEAM-7998) MatchesFiles or MatchAll seems to return seveval time the same element
Jerome MASSOT created BEAM-7998: --- Summary: MatchesFiles or MatchAll seems to return seveval time the same element Key: BEAM-7998 URL: https://issues.apache.org/jira/browse/BEAM-7998 Project: Beam Issue Type: Bug Components: beam-model Affects Versions: 2.14.0 Environment: GCP for storage, DirectRunner and DataflowRunner both have the problem. PyCharm on Win10 for IDE and dev environment. Reporter: Jerome MASSOT Hi team, when I use MatcheFiles using wildcard and files located in a GCP bucket, the MatcheFiles transform returns several times (at least 2) the same file. I have tried to follow the stack, and I can see that the MatchesAll is called twice when I run the pipeline on a debug project where a single element is present in the bucket. But I am not good enough to say more than that. Sorry. Best regards Jerome -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
[ https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296685 ] ASF GitHub Bot logged work on BEAM-7995: Author: ASF GitHub Bot Created on: 17/Aug/19 01:00 Start Date: 17/Aug/19 01:00 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9364: BEAM-7995: python PGBKCVOperation using wrong timestamp URL: https://github.com/apache/beam/pull/9364#discussion_r314927714 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator): if windows is 0: self.output(_globally_windowed_value.with_value((key, value))) else: - self.output(WindowedValue((key, value), windows[0].end, windows)) + self.output(WindowedValue((key, value), windows[0].max_timestamp(), windows)) Review comment: You are right. I read this as just 'max_timestamp()', did not realize it is the max timestamp for the window. Your change looks correct. Referencing your email (https://lists.apache.org/thread.html/ad42c55ed0212cb18b2b29bfc3dddfb47b8cb9f3358583775d0da37b@%3Cdev.beam.apache.org%3E), this needs to be a shared definition between Python and Java. @lukecwik could you confirm that an element in a window should have a timestamp < window.end or could it be <= window.end ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296685) Time Spent: 50m (was: 40m) > IllegalStateException: TimestampCombiner moved element from to earlier time > in Python > - > > Key: BEAM-7995 > URL: https://issues.apache.org/jira/browse/BEAM-7995 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > I'm looking into a bug I found internally when using Beam portable API > (Python) on our own Samza runner. > > The pipeline looks something like this: > > (p > | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent") > | 'transform' >> beam.Map(lambda event: process_event(event)) > | 'window' >> beam.WindowInto(FixedWindows(15)) > | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())* > ... > > The problem comes from the combiners which cause the following exception on > Java side: > > Caused by: java.lang.IllegalStateException: TimestampCombiner moved element > from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z > for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z) > at > org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) > at > org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) > at > org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > > The exception happens here > [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116] > when we check the shifted timestamp to ensure it's before the timestamp. > > if (shifted.isBefore(timestamp)) { > throw new IllegalStateException( > String.format( > "TimestampCombiner moved element from %s to earlier time %s for > window %s", > BoundedWindow.formatTimestamp(timestamp), > BoundedWindow.formatTimestamp(shifted), > window)); > } > > As you can see from the exception, the "shifted" is "XXX 44.999" while the > "timestamp" is "XXX 45.000". The "44.999" is coming from > [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]: > > @Override > public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) { > return intoWindow.maxTimestamp(); > } > > where intoWindow.maxTimestamp() is: > > /** Returns the largest times
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296683&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296683 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 17/Aug/19 00:33 Start Date: 17/Aug/19 00:33 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522188388 run python 3.6 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296683) Time Spent: 11h 50m (was: 11h 40m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=296679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296679 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 17/Aug/19 00:22 Start Date: 17/Aug/19 00:22 Worklog Time Spent: 10m Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest testing infrastructure URL: https://github.com/apache/beam/pull/7949#issuecomment-522187314 Still working This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296679) Time Spent: 3h 50m (was: 3h 40m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=296680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296680 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 17/Aug/19 00:22 Start Date: 17/Aug/19 00:22 Worklog Time Spent: 10m Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest testing infrastructure URL: https://github.com/apache/beam/pull/7949#issuecomment-522187314 no disassemble This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296680) Time Spent: 4h (was: 3h 50m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296678 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 17/Aug/19 00:15 Start Date: 17/Aug/19 00:15 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r314924560 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -642,6 +642,45 @@ message StandardCoders { // Components: Coder for a single element. // Experimental. STATE_BACKED_ITERABLE = 9 [(beam_urn) = "beam:coder:state_backed_iterable:v1"]; + +// Additional Standard Coders +// -- +// The following coders are not required to be implemented for an SDK or +// runner to support the Beam model, but can enable additional +// functionality. + +// Encodes a "row", an element with a known schema, defined by an +// instance of Schema from schema.proto. +// +// A row is encoded as the concatenation of: +// - The number of attributes in the schema, encoded with +// beam:coder:varint:v1. This is useful for detecting supported schema +// changes (column additions/deletions). Review comment: Is the new explanation in the [latest commit](https://github.com/apache/beam/pull/9188/commits/25dcc50a8de9c607069a8efc80a6da67a6e8b0ca) better? My intention was to indicate that "column additions/deletions" are the supported schema changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296678) Time Spent: 3h 40m (was: 3.5h) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296677 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 17/Aug/19 00:12 Start Date: 17/Aug/19 00:12 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314924082 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions.go ## @@ -40,6 +40,8 @@ func FunctionName(fn interface{}) string { // points to a valid function implementation. func LoadFunction(ptr uintptr, t reflect.Type) interface{} { v := reflect.New(t).Elem() - *(*uintptr)(unsafe.Pointer(v.Addr().Pointer())) = (uintptr)(unsafe.Pointer(&ptr)) + p := new(uintptr) + *p = ptr + *(*unsafe.Pointer)(unsafe.Pointer(v.Addr().Pointer())) = unsafe.Pointer(p) Review comment: Thanks for the nice summary! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296677) Time Spent: 2h (was: 1h 50m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296674 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 17/Aug/19 00:04 Start Date: 17/Aug/19 00:04 Worklog Time Spent: 10m Work Description: wcn3 commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314923417 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { + val := reflect.ValueOf(testFunction) + fi := uintptr(val.Pointer()) + typ := val.Type() + + callable := LoadFunction(fi, typ) + + cv := reflect.ValueOf(callable) + out := cv.Call(nil) + if len(out) != 1 { + t.Errorf("got %d return values, wanted 1.", len(out)) + } + // TODO: check type? + if out[0].Int() != 42 { + t.Errorf("got %d, wanted 42", out[0].Int()) + } Review comment: Good point, will change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296674) Time Spent: 1h 50m (was: 1h 40m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
[ https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296675&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296675 ] ASF GitHub Bot logged work on BEAM-7924: Author: ASF GitHub Bot Created on: 17/Aug/19 00:04 Start Date: 17/Aug/19 00:04 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9365: [BEAM-7924] fix pipeline options error with main session URL: https://github.com/apache/beam/pull/9365#issuecomment-522185355 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296675) Time Spent: 4h 10m (was: 4h) > Failure in Python 2 postcommit: crossLanguagePythonJavaFlink > > > Key: BEAM-7924 > URL: https://issues.apache.org/jira/browse/BEAM-7924 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Heejong Lee >Priority: Major > Fix For: 2.15.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > This seems to be the root cause: > {code} > 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - > Discarding unparseable args: [u'--app_name=None', > u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', > u'--direct_runner_use_stacked_bundle', u'--options_id=1', > u'--fail_on_checkpointing_errors', u'--enable_metrics', > u'--pipeline_type_check', u'--parallelism=2'] > 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk > harness started with pipeline_options: {'runner': u'None', 'experiments': > [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': > u'1', 'sdk_location': u'container', 'job_name': > u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': > u'us-central1', 'sdk_worker_parallelism': u'1'} > 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk > harness failed: > 11:32:59 Traceback (most recent call last): > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py", > line 153, in main > 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions)) > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py", > line 334, in __getattr__ > 11:32:59 (type(self).__name__, name)) > 11:32:59 AttributeError: 'PipelineOptions' object has no attribute > 'ProfilingOptions' > {code} > https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296673 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 17/Aug/19 00:04 Start Date: 17/Aug/19 00:04 Worklog Time Spent: 10m Work Description: wcn3 commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314923396 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions.go ## @@ -40,6 +40,8 @@ func FunctionName(fn interface{}) string { // points to a valid function implementation. func LoadFunction(ptr uintptr, t reflect.Type) interface{} { v := reflect.New(t).Elem() - *(*uintptr)(unsafe.Pointer(v.Addr().Pointer())) = (uintptr)(unsafe.Pointer(&ptr)) + p := new(uintptr) + *p = ptr + *(*unsafe.Pointer)(unsafe.Pointer(v.Addr().Pointer())) = unsafe.Pointer(p) Review comment: The problem was that the original code relies on the behavior that when &ptr is evaluated, the resulting anonymous value will be on the heap. The new compiler is able to not put it on the heap, and does so, which causes this code to crash in Go 1.13. This change makes the expected behavior explicit by allocating p on the heap and then assigning it's pointed-to value to ptr, thus p == &ptr and the remaining code works as before. The unit test exercises this function. I verified that the old code crashed under Go 1.13 (worked in Go 1.12) and the new code works in both versions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296673) Time Spent: 1h 40m (was: 1.5h) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version
[ https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=296672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296672 ] ASF GitHub Bot logged work on BEAM-7970: Author: ASF GitHub Bot Created on: 17/Aug/19 00:01 Start Date: 17/Aug/19 00:01 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9367: [BEAM-7970] Regenerate Go SDK protos in correct version. URL: https://github.com/apache/beam/pull/9367 Regenerating protos such that they actually get generated as proto3, which is how they should've been for a while. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastC
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296671&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296671 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 17/Aug/19 00:00 Start Date: 17/Aug/19 00:00 Worklog Time Spent: 10m Work Description: wcn3 commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314922876 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { Review comment: I thought the last commit fixed this, but clearly not. Will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296671) Time Spent: 1.5h (was: 1h 20m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=296668&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296668 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 16/Aug/19 23:45 Start Date: 16/Aug/19 23:45 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9351: [BEAM-7909] support customized container for Python URL: https://github.com/apache/beam/pull/9351#discussion_r314920513 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -84,16 +84,10 @@ def __init__(self): @staticmethod def default_docker_image(): if 'USER' in os.environ: - if sys.version_info[0] == 2: -version_suffix = '' - elif sys.version_info[0:2] == (3, 5): -version_suffix = '3' - else: -version_suffix = '3' -# TODO(BEAM-7474): Use an image which has correct Python minor version. -logging.warning('Make sure that locally built Python SDK docker image ' -'has Python %d.%d interpreter. See also: BEAM-7474.' % ( -sys.version_info[0], sys.version_info[1])) + version_suffix = ''.join([str(i) for i in sys.version_info[0:2]]) Review comment: This changes the behavior for python 2. I agree with the change, let's make sure that we test it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296668) Time Spent: 2h 10m (was: 2h) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=296667&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296667 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 16/Aug/19 23:45 Start Date: 16/Aug/19 23:45 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9351: [BEAM-7909] support customized container for Python URL: https://github.com/apache/beam/pull/9351#discussion_r314921003 ## File path: sdks/python/container/base_image_requirements.txt ## @@ -67,7 +67,7 @@ pandas==0.23.4 protorpc==0.11.1 python-gflags==3.0.6 setuptools<=39.1.0 # requirement for Tensorflow. -tensorflow==1.11.0 +tensorflow==1.13.1 Review comment: Could you upgrade these dependencies to match (a) setup.py (latest in the range if possible) and (b) https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies We should probably try to figure out a way to keep version in sync. This was recently discussed in the mailing list. For ideas you can look at that (https://lists.apache.org/thread.html/e174967bab6c224db40f66af74e7d07466812cc6b78189e55da88cb0@%3Cdev.beam.apache.org%3E) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296667) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=29&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-29 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 16/Aug/19 23:45 Start Date: 16/Aug/19 23:45 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9351: [BEAM-7909] support customized container for Python URL: https://github.com/apache/beam/pull/9351#discussion_r314920538 ## File path: sdks/python/apache_beam/runners/portability/portable_runner.py ## @@ -84,16 +84,10 @@ def __init__(self): @staticmethod def default_docker_image(): if 'USER' in os.environ: - if sys.version_info[0] == 2: -version_suffix = '' - elif sys.version_info[0:2] == (3, 5): -version_suffix = '3' - else: -version_suffix = '3' -# TODO(BEAM-7474): Use an image which has correct Python minor version. Review comment: You can probably close this issue as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 29) Time Spent: 2h (was: 1h 50m) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work started] (BEAM-7970) Regenerate Go SDK proto files in correct version
[ https://issues.apache.org/jira/browse/BEAM-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-7970 started by Daniel Oliveira. - > Regenerate Go SDK proto files in correct version > > > Key: BEAM-7970 > URL: https://issues.apache.org/jira/browse/BEAM-7970 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Daniel Oliveira >Assignee: Daniel Oliveira >Priority: Major > > Generated proto files in the Go SDK currently include this bit: > {{// This is a compile-time assertion to ensure that this generated file}} > {{// is compatible with the proto package it is being compiled against.}} > {{// A compilation error at this line likely means your copy of the}} > {{// proto package needs to be updated.}} > {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}} > > This indicates that the protos are being generated as proto v2 for whatever > reason. Most likely, as mentioned by this post with someone with a similar > issue, because the proto generation binary needs to be rebuilt before > generating the files again: > [https://github.com/golang/protobuf/issues/449#issuecomment-340884839] > This hasn't caused any errors so far, but might eventually cause errors if we > hit version differences between the v2 and v3 protos. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296664 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 16/Aug/19 23:43 Start Date: 16/Aug/19 23:43 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883 Here's some info on where I am with this. I could really use some help to push this over the finish line. The expansion service runs correctly, sends the expanded transforms back to python, but the job fails inside Java on Flink because it's trying to use the incorrect serializer. There's a good chance that I'm overlooking something very obvious. Here's the stack trace: ``` 2019-08-16 16:04:58,297 INFO org.apache.flink.runtime.taskmanager.Task - Source: PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource) (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED. java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97) at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) at org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46) at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107) ... 15 more ``` Here's the serializer that `CoderTypeSerialize
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296665&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296665 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 16/Aug/19 23:43 Start Date: 16/Aug/19 23:43 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883 Here's some info on where I am with this. I could really use some help to push this over the finish line. The expansion service runs correctly, sends the expanded transforms back to python, but the job fails inside Java on Flink because it's trying to use the incorrect serializer. There's a good chance that I'm overlooking something very obvious. Here's the stack trace: ``` 2019-08-16 16:04:58,297 INFO org.apache.flink.runtime.taskmanager.Task - Source: PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource) (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED. java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97) at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) at org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46) at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107) ... 15 more ``` Here's the serializer that `CoderTypeSerialize
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296663 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 16/Aug/19 23:40 Start Date: 16/Aug/19 23:40 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883 Here's some info on where I am with this. I could really use some help to push this over the finish line. The expansion service runs correctly, sends the expanded transforms back to python, but the job fails inside Java on Flink because it's trying to use the incorrect serializer. There's a good chance that I'm overlooking something very obvious. Here's the stack trace: ``` 2019-08-16 16:04:58,297 INFO org.apache.flink.runtime.taskmanager.Task - Source: PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource) (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED. java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97) at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) at org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46) at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107) ... 15 more ``` Here's the serializer that `CoderTypeSerialize
[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=296658&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296658 ] ASF GitHub Bot logged work on BEAM-7738: Author: ASF GitHub Bot Created on: 16/Aug/19 23:39 Start Date: 16/Aug/19 23:39 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9268: [BEAM-7738] Add external transform support to PubsubIO URL: https://github.com/apache/beam/pull/9268#issuecomment-522181883 Here's some info on where I am with this. I could really use some help to push this over the finish line. The expansion service runs correctly, sends the expanded transforms back to python, but the job fails inside Java because it's trying to use the incorrect serializer. There's a good chance that I'm overlooking something very obvious. Here's the stack trace: ``` 2019-08-16 16:04:58,297 INFO org.apache.flink.runtime.taskmanager.Task - Source: PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource) (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED. java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342) at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97) at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41) at org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105) at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569) at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529) at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46) at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54) at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160) at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128) at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107) ... 15 more ``` Here's the serializer that `CoderTypeSerializer.serializ
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=296657&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296657 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 16/Aug/19 23:37 Start Date: 16/Aug/19 23:37 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9261: [BEAM-7389] Add code examples for Partition page URL: https://github.com/apache/beam/pull/9261 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296657) Time Spent: 48h 50m (was: 48h 40m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 48h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296654 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 16/Aug/19 23:34 Start Date: 16/Aug/19 23:34 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522181291 run python 3.5 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296654) Time Spent: 11h 40m (was: 11.5h) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-5820) Vendor Calcite
[ https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=296650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296650 ] ASF GitHub Bot logged work on BEAM-5820: Author: ASF GitHub Bot Created on: 16/Aug/19 23:25 Start Date: 16/Aug/19 23:25 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9333: [BEAM-5820] release vendor calcite URL: https://github.com/apache/beam/pull/9333#issuecomment-522180056 Thanks Luke! Could you help with releasing the 0.1 version? Are there some instructions I can help with? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296650) Time Spent: 4h 50m (was: 4h 40m) > Vendor Calcite > -- > > Key: BEAM-5820 > URL: https://issues.apache.org/jira/browse/BEAM-5820 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296649 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 16/Aug/19 23:24 Start Date: 16/Aug/19 23:24 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522179920 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296649) Time Spent: 11.5h (was: 11h 20m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 11.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296648&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296648 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 16/Aug/19 23:18 Start Date: 16/Aug/19 23:18 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r314917601 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -642,6 +642,45 @@ message StandardCoders { // Components: Coder for a single element. // Experimental. STATE_BACKED_ITERABLE = 9 [(beam_urn) = "beam:coder:state_backed_iterable:v1"]; + +// Additional Standard Coders +// -- +// The following coders are not required to be implemented for an SDK or +// runner to support the Beam model, but can enable additional +// functionality. + +// Encodes a "row", an element with a known schema, defined by an Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296648) Time Spent: 3.5h (was: 3h 20m) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296645 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 16/Aug/19 23:17 Start Date: 16/Aug/19 23:17 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r314917511 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -642,6 +642,45 @@ message StandardCoders { // Components: Coder for a single element. // Experimental. STATE_BACKED_ITERABLE = 9 [(beam_urn) = "beam:coder:state_backed_iterable:v1"]; + +// Additional Standard Coders +// -- +// The following coders are not required to be implemented for an SDK or +// runner to support the Beam model, but can enable additional +// functionality. + +// Encodes a "row", an element with a known schema, defined by an +// instance of Schema from schema.proto. +// +// A row is encoded as the concatenation of: +// - The number of attributes in the schema, encoded with +// beam:coder:varint:v1. This is useful for detecting supported schema +// changes (column additions/deletions). +// - A packed bitset indicating null fields (a 1 indicating a null) Review comment: Good suggestion, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296645) Time Spent: 3h 10m (was: 3h) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296646 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 16/Aug/19 23:17 Start Date: 16/Aug/19 23:17 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r314917564 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -642,6 +642,45 @@ message StandardCoders { // Components: Coder for a single element. // Experimental. STATE_BACKED_ITERABLE = 9 [(beam_urn) = "beam:coder:state_backed_iterable:v1"]; + +// Additional Standard Coders +// -- +// The following coders are not required to be implemented for an SDK or +// runner to support the Beam model, but can enable additional +// functionality. + +// Encodes a "row", an element with a known schema, defined by an +// instance of Schema from schema.proto. +// +// A row is encoded as the concatenation of: +// - The number of attributes in the schema, encoded with +// beam:coder:varint:v1. This is useful for detecting supported schema +// changes (column additions/deletions). +// - A packed bitset indicating null fields (a 1 indicating a null) +// encoded with beam:coder:bytes:v1. If there are no nulls an empty byte +// array is encoded. +// - An encoding for each non-null field, concatenated together. +// +// Schema types are mapped to coders as follows: +// AtomicType: +// BYTE: not yet a standard coder Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296646) Time Spent: 3h 20m (was: 3h 10m) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7888) Test Multi Process Direct Runner With Largish Data
[ https://issues.apache.org/jira/browse/BEAM-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-7888: --- Issue Type: Task (was: Test) > Test Multi Process Direct Runner With Largish Data > -- > > Key: BEAM-7888 > URL: https://issues.apache.org/jira/browse/BEAM-7888 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Major > > Filing this as a tracker. > We can test multiprocess runner with a largish amount of data to the extend > that we can do this on Jenkins. This will serve 2 purposes: > - Find out issues related to multi processing. It would be easier to find > rare issues when running over non-trivial data. > - Serve as a baseline (if not a benchmark) to understand the limits of the > multiprocess runner. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7888) Test Multi Process Direct Runner With Largish Data
[ https://issues.apache.org/jira/browse/BEAM-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-7888: --- Issue Type: Test (was: Sub-task) Parent: (was: BEAM-3645) > Test Multi Process Direct Runner With Largish Data > -- > > Key: BEAM-7888 > URL: https://issues.apache.org/jira/browse/BEAM-7888 > Project: Beam > Issue Type: Test > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Hannah Jiang >Priority: Major > > Filing this as a tracker. > We can test multiprocess runner with a largish amount of data to the extend > that we can do this on Jenkins. This will serve 2 purposes: > - Find out issues related to multi processing. It would be easier to find > rare issues when running over non-trivial data. > - Serve as a baseline (if not a benchmark) to understand the limits of the > multiprocess runner. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7997) Create a single docker container for a pipeline instead of creating one for each worker.
Hannah Jiang created BEAM-7997: -- Summary: Create a single docker container for a pipeline instead of creating one for each worker. Key: BEAM-7997 URL: https://issues.apache.org/jira/browse/BEAM-7997 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Hannah Jiang Assignee: Hannah Jiang With --direct_nun_workers option, current implementation creates a container for each worker. We should change this behavior to creating a container for a pipeline and handles multi workers inside of the container. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7462) Add Sampled Byte Count Metric to the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-7462?focusedWorklogId=296641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296641 ] ASF GitHub Bot logged work on BEAM-7462: Author: ASF GitHub Bot Created on: 16/Aug/19 23:07 Start Date: 16/Aug/19 23:07 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #8809: [WIP] [BEAM-7462] Update java SDK to report the SampledByteCount counter. URL: https://github.com/apache/beam/pull/8809 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296641) Time Spent: 1h (was: 50m) > Add Sampled Byte Count Metric to the Java SDK > - > > Key: BEAM-7462 > URL: https://issues.apache.org/jira/browse/BEAM-7462 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Luke Cwik >Priority: Major > Labels: portable-metrics-bugs > Attachments: bundle_descriptor_dump.txt > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7462) Add Sampled Byte Count Metric to the Java SDK
[ https://issues.apache.org/jira/browse/BEAM-7462?focusedWorklogId=296640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296640 ] ASF GitHub Bot logged work on BEAM-7462: Author: ASF GitHub Bot Created on: 16/Aug/19 23:07 Start Date: 16/Aug/19 23:07 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #8809: [WIP] [BEAM-7462] Update java SDK to report the SampledByteCount counter. URL: https://github.com/apache/beam/pull/8809#issuecomment-522177266 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296640) Time Spent: 50m (was: 40m) > Add Sampled Byte Count Metric to the Java SDK > - > > Key: BEAM-7462 > URL: https://issues.apache.org/jira/browse/BEAM-7462 > Project: Beam > Issue Type: New Feature > Components: java-fn-execution >Reporter: Alex Amato >Assignee: Luke Cwik >Priority: Major > Labels: portable-metrics-bugs > Attachments: bundle_descriptor_dump.txt > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7997) Create a single docker container for a pipeline instead of creating one for each worker.
[ https://issues.apache.org/jira/browse/BEAM-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-7997: --- Status: Open (was: Triage Needed) > Create a single docker container for a pipeline instead of creating one for > each worker. > > > Key: BEAM-7997 > URL: https://issues.apache.org/jira/browse/BEAM-7997 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Labels: portability > > With --direct_nun_workers option, current implementation creates a container > for each worker. We should change this behavior to creating a container for a > pipeline and handles multi workers inside of the container. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7981) ParDo function wrapper doesn't support Iterable output types
[ https://issues.apache.org/jira/browse/BEAM-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-7981: --- Assignee: Udi Meiri > ParDo function wrapper doesn't support Iterable output types > > > Key: BEAM-7981 > URL: https://issues.apache.org/jira/browse/BEAM-7981 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > I believe the bug is in CallableWrapperDoFn.default_type_hints, which > converts Iterable[str] to str. > This test will be included (commented out) in > https://github.com/apache/beam/pull/9283 > {code} > def test_typed_callable_iterable_output(self): > @typehints.with_input_types(int) > @typehints.with_output_types(typehints.Iterable[str]) > def do_fn(element): > return [[str(element)] * 2] > result = [1, 2] | beam.ParDo(do_fn) > self.assertEqual([['1', '1'], ['2', '2']], sorted(result)) > {code} > Result: > {code} > == > ERROR: test_typed_callable_iterable_output > (apache_beam.typehints.typed_pipeline_test.MainInputTest) > -- > Traceback (most recent call last): > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 104, in test_typed_callable_iterable_output > result = [1, 2] | beam.ParDo(do_fn) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 519, in __ror__ > p.run().wait_until_finish() > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", > line 129, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 366, in run_pipeline > default_environment=self._default_environment)) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 373, in run_via_runner_api > return self.run_stages(stage_context, stages) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 455, in run_stages > stage_context.safe_coders) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 733, in _run_stage > result, splits = bundle_manager.process_bundle(data_input, data_output) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1663, in process_bundle > part, expected_outputs), part_inputs): > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in > result_iterator > yield fs.pop().result() > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result > return self.__get_result() > File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in > __get_result > raise self._exception > File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run > result = self.fn(*self.args, **self.kwargs) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1663, in > part, expected_outputs), part_inputs): > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1601, in process_bundle > result_future = self._worker_handler.control_conn.push(process_bundle_req) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", > line 1080, in push > response = self.worker.do_instruction(request) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 343, in do_instruction > request.instruction_id) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", > line 369, in process_bundle > bundle_processor.process_bundle(instruction_id)) > File > "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", > line 593, in process_bundle > data.ptransform_id].process_encoded(data.data) > File > "/usr/local/google/home/ehudm/src/beam/
[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
[ https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296637 ] ASF GitHub Bot logged work on BEAM-7924: Author: ASF GitHub Bot Created on: 16/Aug/19 22:41 Start Date: 16/Aug/19 22:41 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9365: [BEAM-7924] fix pipeline options error with main session URL: https://github.com/apache/beam/pull/9365#issuecomment-522172714 It turns out the issue was because I had saved `pipeline_options` as a global variable in my main session, and that was shadowing the `pipeline_options` module in the SDK worker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296637) Time Spent: 4h (was: 3h 50m) > Failure in Python 2 postcommit: crossLanguagePythonJavaFlink > > > Key: BEAM-7924 > URL: https://issues.apache.org/jira/browse/BEAM-7924 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Heejong Lee >Priority: Major > Fix For: 2.15.0 > > Time Spent: 4h > Remaining Estimate: 0h > > This seems to be the root cause: > {code} > 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - > Discarding unparseable args: [u'--app_name=None', > u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', > u'--direct_runner_use_stacked_bundle', u'--options_id=1', > u'--fail_on_checkpointing_errors', u'--enable_metrics', > u'--pipeline_type_check', u'--parallelism=2'] > 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk > harness started with pipeline_options: {'runner': u'None', 'experiments': > [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': > u'1', 'sdk_location': u'container', 'job_name': > u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': > u'us-central1', 'sdk_worker_parallelism': u'1'} > 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk > harness failed: > 11:32:59 Traceback (most recent call last): > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py", > line 153, in main > 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions)) > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py", > line 334, in __getattr__ > 11:32:59 (type(self).__name__, name)) > 11:32:59 AttributeError: 'PipelineOptions' object has no attribute > 'ProfilingOptions' > {code} > https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296636 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 22:37 Start Date: 16/Aug/19 22:37 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314909460 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { Review comment: Yeah, the name here seems wrong. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296636) Time Spent: 1h 20m (was: 1h 10m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296632 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 22:36 Start Date: 16/Aug/19 22:36 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314909234 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions.go ## @@ -40,6 +40,8 @@ func FunctionName(fn interface{}) string { // points to a valid function implementation. func LoadFunction(ptr uintptr, t reflect.Type) interface{} { v := reflect.New(t).Elem() - *(*uintptr)(unsafe.Pointer(v.Addr().Pointer())) = (uintptr)(unsafe.Pointer(&ptr)) + p := new(uintptr) + *p = ptr + *(*unsafe.Pointer)(unsafe.Pointer(v.Addr().Pointer())) = unsafe.Pointer(p) Review comment: I had to read up a bit on how reflect and unsafe work to understand what was happening here, but I still don't understand exactly why the previous version was invalid and this one valid. (I'm using https://golang.org/pkg/unsafe/#Pointer as a guideline). Is it because in the old version we simply put in the address to `ptr`, and in the new one we actually create a new copy? Or is it because in the old version we're reassigning to `v` as a `uintptr` instead of as an `unsafe.Pointer`? Those are the only real differences I could figure out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296632) Time Spent: 50m (was: 40m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296635&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296635 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 22:36 Start Date: 16/Aug/19 22:36 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314910880 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { + val := reflect.ValueOf(testFunction) + fi := uintptr(val.Pointer()) + typ := val.Type() + + callable := LoadFunction(fi, typ) + + cv := reflect.ValueOf(callable) + out := cv.Call(nil) + if len(out) != 1 { + t.Errorf("got %d return values, wanted 1.", len(out)) + } + // TODO: check type? + if out[0].Int() != 42 { + t.Errorf("got %d, wanted 42", out[0].Int()) + } Review comment: Nit: Instead of comparing to 42 directly, could you do something like: ``` expected := testFunction() if out[0].Int() != expected { t.Errorf("got %d, wanted %d", out[0].Int(), expected) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296635) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296634 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 22:36 Start Date: 16/Aug/19 22:36 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314909460 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { Review comment: Yeah, the name here seems wrong. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296634) Time Spent: 1h 10m (was: 1h) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296633 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 22:36 Start Date: 16/Aug/19 22:36 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314910919 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { Review comment: +1, name seems off. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296633) Time Spent: 1h (was: 50m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7996) Add support for remaining data types in python RowCoder
Brian Hulette created BEAM-7996: --- Summary: Add support for remaining data types in python RowCoder Key: BEAM-7996 URL: https://issues.apache.org/jira/browse/BEAM-7996 Project: Beam Issue Type: New Feature Components: sdk-py-core Reporter: Brian Hulette In the initial [python RowCoder implementation|https://github.com/apache/beam/pull/9188] we only added support for the data types that already had coders in the Python SDK. We should add support for the remaining data types that are not currently supported: * INT8 (ByteCoder in Java) * INT16 (BigEndianShortCoder in Java) * FLOAT (FloatCoder in Java) * BOOLEAN (BooleanCoder in Java) * Map (MapCoder in Java) We might consider making those coders standard so they can be tested independently from RowCoder in standard_coders.yaml. Or, if we don't do that we should probably add a more robust testing framework for RowCoder itself, because it will be challenging to test all of these types as part of the RowCoder tests in standard_coders.yaml. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
[ https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296624 ] ASF GitHub Bot logged work on BEAM-7924: Author: ASF GitHub Bot Created on: 16/Aug/19 22:08 Start Date: 16/Aug/19 22:08 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9365: [BEAM-7924] fix pipeline options error with main session URL: https://github.com/apache/beam/pull/9365#issuecomment-522166406 I made this change to fix a future test, which I have not uploaded yet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296624) Time Spent: 3h 50m (was: 3h 40m) > Failure in Python 2 postcommit: crossLanguagePythonJavaFlink > > > Key: BEAM-7924 > URL: https://issues.apache.org/jira/browse/BEAM-7924 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Heejong Lee >Priority: Major > Fix For: 2.15.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > This seems to be the root cause: > {code} > 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - > Discarding unparseable args: [u'--app_name=None', > u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', > u'--direct_runner_use_stacked_bundle', u'--options_id=1', > u'--fail_on_checkpointing_errors', u'--enable_metrics', > u'--pipeline_type_check', u'--parallelism=2'] > 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk > harness started with pipeline_options: {'runner': u'None', 'experiments': > [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': > u'1', 'sdk_location': u'container', 'job_name': > u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': > u'us-central1', 'sdk_worker_parallelism': u'1'} > 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk > harness failed: > 11:32:59 Traceback (most recent call last): > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py", > line 153, in main > 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions)) > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py", > line 334, in __getattr__ > 11:32:59 (type(self).__name__, name)) > 11:32:59 AttributeError: 'PipelineOptions' object has no attribute > 'ProfilingOptions' > {code} > https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296619 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 21:55 Start Date: 16/Aug/19 21:55 Worklog Time Spent: 10m Work Description: dsnet commented on issue #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#issuecomment-522163621 More background can be found here: golang/go#33662 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296619) Time Spent: 40m (was: 0.5h) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
[ https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296620 ] ASF GitHub Bot logged work on BEAM-7995: Author: ASF GitHub Bot Created on: 16/Aug/19 21:55 Start Date: 16/Aug/19 21:55 Worklog Time Spent: 10m Work Description: lhaiesp commented on pull request #9364: BEAM-7995: python PGBKCVOperation using wrong timestamp URL: https://github.com/apache/beam/pull/9364#discussion_r314902937 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator): if windows is 0: self.output(_globally_windowed_value.with_value((key, value))) else: - self.output(WindowedValue((key, value), windows[0].end, windows)) + self.output(WindowedValue((key, value), windows[0].max_timestamp(), windows)) Review comment: Why is it any more incorrect than "end"? max_timestamp = end - 0.001 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296620) Time Spent: 40m (was: 0.5h) > IllegalStateException: TimestampCombiner moved element from to earlier time > in Python > - > > Key: BEAM-7995 > URL: https://issues.apache.org/jira/browse/BEAM-7995 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I'm looking into a bug I found internally when using Beam portable API > (Python) on our own Samza runner. > > The pipeline looks something like this: > > (p > | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent") > | 'transform' >> beam.Map(lambda event: process_event(event)) > | 'window' >> beam.WindowInto(FixedWindows(15)) > | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())* > ... > > The problem comes from the combiners which cause the following exception on > Java side: > > Caused by: java.lang.IllegalStateException: TimestampCombiner moved element > from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z > for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z) > at > org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) > at > org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) > at > org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > > The exception happens here > [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116] > when we check the shifted timestamp to ensure it's before the timestamp. > > if (shifted.isBefore(timestamp)) { > throw new IllegalStateException( > String.format( > "TimestampCombiner moved element from %s to earlier time %s for > window %s", > BoundedWindow.formatTimestamp(timestamp), > BoundedWindow.formatTimestamp(shifted), > window)); > } > > As you can see from the exception, the "shifted" is "XXX 44.999" while the > "timestamp" is "XXX 45.000". The "44.999" is coming from > [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]: > > @Override > public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) { > return intoWindow.maxTimestamp(); > } > > where intoWindow.maxTimestamp() is: > > /** Returns the largest timestamp that can be included in this window. */ > @Override > public Instant maxTimestamp() { > *// end not inclusive* > return *end.minus(1)*; > } > > Hence, the "44.*999*". > > And the "45.000" comes from the Python side when the combiner output results > as pre GBK operation: > [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/
[jira] [Commented] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
[ https://issues.apache.org/jira/browse/BEAM-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909435#comment-16909435 ] Kyle Weaver commented on BEAM-7924: --- Update: I've identified the issue; see [GitHub Pull Request #9365|https://github.com/apache/beam/pull/9365] > Failure in Python 2 postcommit: crossLanguagePythonJavaFlink > > > Key: BEAM-7924 > URL: https://issues.apache.org/jira/browse/BEAM-7924 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Heejong Lee >Priority: Major > Fix For: 2.15.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > This seems to be the root cause: > {code} > 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - > Discarding unparseable args: [u'--app_name=None', > u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', > u'--direct_runner_use_stacked_bundle', u'--options_id=1', > u'--fail_on_checkpointing_errors', u'--enable_metrics', > u'--pipeline_type_check', u'--parallelism=2'] > 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk > harness started with pipeline_options: {'runner': u'None', 'experiments': > [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': > u'1', 'sdk_location': u'container', 'job_name': > u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': > u'us-central1', 'sdk_worker_parallelism': u'1'} > 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk > harness failed: > 11:32:59 Traceback (most recent call last): > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py", > line 153, in main > 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions)) > 11:32:59 File > "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py", > line 334, in __getattr__ > 11:32:59 (type(self).__name__, name)) > 11:32:59 AttributeError: 'PipelineOptions' object has no attribute > 'ProfilingOptions' > {code} > https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
[ https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=296617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296617 ] ASF GitHub Bot logged work on BEAM-7924: Author: ASF GitHub Bot Created on: 16/Aug/19 21:52 Start Date: 16/Aug/19 21:52 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9365: [BEAM-7924] fix pipeline options error with main session URL: https://github.com/apache/beam/pull/9365 When the `save_main_session` option is enabled, the Python SDK harness currently fails with the following message: ``` AttributeError: 'PipelineOptions' object has no attribute 'ProfilingOptions' ``` I'm not sure exactly what's going on here, so it'd be great if anyone could help explain why this bandaid fix works, and if we might need a deeper fix. R: @robertwb @aaltay Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/b
[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
[ https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296614 ] ASF GitHub Bot logged work on BEAM-7995: Author: ASF GitHub Bot Created on: 16/Aug/19 21:49 Start Date: 16/Aug/19 21:49 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9364: BEAM-7995: python PGBKCVOperation using wrong timestamp URL: https://github.com/apache/beam/pull/9364#discussion_r314901491 ## File path: sdks/python/apache_beam/runners/worker/operations.py ## @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator): if windows is 0: self.output(_globally_windowed_value.with_value((key, value))) else: - self.output(WindowedValue((key, value), windows[0].end, windows)) + self.output(WindowedValue((key, value), windows[0].max_timestamp(), windows)) Review comment: Would not this make the timestamp for the element incorrect? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296614) Time Spent: 0.5h (was: 20m) > IllegalStateException: TimestampCombiner moved element from to earlier time > in Python > - > > Key: BEAM-7995 > URL: https://issues.apache.org/jira/browse/BEAM-7995 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I'm looking into a bug I found internally when using Beam portable API > (Python) on our own Samza runner. > > The pipeline looks something like this: > > (p > | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent") > | 'transform' >> beam.Map(lambda event: process_event(event)) > | 'window' >> beam.WindowInto(FixedWindows(15)) > | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())* > ... > > The problem comes from the combiners which cause the following exception on > Java side: > > Caused by: java.lang.IllegalStateException: TimestampCombiner moved element > from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z > for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z) > at > org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) > at > org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) > at > org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > > The exception happens here > [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116] > when we check the shifted timestamp to ensure it's before the timestamp. > > if (shifted.isBefore(timestamp)) { > throw new IllegalStateException( > String.format( > "TimestampCombiner moved element from %s to earlier time %s for > window %s", > BoundedWindow.formatTimestamp(timestamp), > BoundedWindow.formatTimestamp(shifted), > window)); > } > > As you can see from the exception, the "shifted" is "XXX 44.999" while the > "timestamp" is "XXX 45.000". The "44.999" is coming from > [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]: > > @Override > public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) { > return intoWindow.maxTimestamp(); > } > > where intoWindow.maxTimestamp() is: > > /** Returns the largest timestamp that can be included in this window. */ > @Override > public Instant maxTimestamp() { > *// end not inclusive* > return *end.minus(1)*; > } > > Hence, the "44.*999*". > > And the "45.000" comes from the Python side when the combiner output results > as pre GBK operation: > [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/w
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296609&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296609 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 16/Aug/19 21:46 Start Date: 16/Aug/19 21:46 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522161595 run python 3.7 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296609) Time Spent: 11h 20m (was: 11h 10m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296608 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 16/Aug/19 21:46 Start Date: 16/Aug/19 21:46 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522161571 run python 2 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296608) Time Spent: 11h 10m (was: 11h) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296604 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 21:42 Start Date: 16/Aug/19 21:42 Worklog Time Spent: 10m Work Description: udim commented on issue #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#issuecomment-522160484 > Is that external to the Beam project? Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296604) Time Spent: 6h 50m (was: 6h 40m) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296597 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 21:32 Start Date: 16/Aug/19 21:32 Worklog Time Spent: 10m Work Description: dsnet commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#discussion_r314897093 ## File path: sdks/go/pkg/beam/core/util/reflectx/functions_test.go ## @@ -0,0 +1,43 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package reflectx + +import ( + "reflect" + "testing" +) + +func testFunction() int { + return 42 +} + +func TestXxx(t *testing.T) { Review comment: `TestLoadFunction`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296597) Time Spent: 0.5h (was: 20m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
[ https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296594 ] ASF GitHub Bot logged work on BEAM-7995: Author: ASF GitHub Bot Created on: 16/Aug/19 21:23 Start Date: 16/Aug/19 21:23 Worklog Time Spent: 10m Work Description: lhaiesp commented on issue #9364: BEAM-7995: python PGBKCVOperation using wrong timestamp URL: https://github.com/apache/beam/pull/9364#issuecomment-522155765 @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296594) Time Spent: 20m (was: 10m) > IllegalStateException: TimestampCombiner moved element from to earlier time > in Python > - > > Key: BEAM-7995 > URL: https://issues.apache.org/jira/browse/BEAM-7995 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > I'm looking into a bug I found internally when using Beam portable API > (Python) on our own Samza runner. > > The pipeline looks something like this: > > (p > | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent") > | 'transform' >> beam.Map(lambda event: process_event(event)) > | 'window' >> beam.WindowInto(FixedWindows(15)) > | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())* > ... > > The problem comes from the combiners which cause the following exception on > Java side: > > Caused by: java.lang.IllegalStateException: TimestampCombiner moved element > from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z > for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z) > at > org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) > at > org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) > at > org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) > at > org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) > at > org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) > at > org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) > > The exception happens here > [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116] > when we check the shifted timestamp to ensure it's before the timestamp. > > if (shifted.isBefore(timestamp)) { > throw new IllegalStateException( > String.format( > "TimestampCombiner moved element from %s to earlier time %s for > window %s", > BoundedWindow.formatTimestamp(timestamp), > BoundedWindow.formatTimestamp(shifted), > window)); > } > > As you can see from the exception, the "shifted" is "XXX 44.999" while the > "timestamp" is "XXX 45.000". The "44.999" is coming from > [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]: > > @Override > public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) { > return intoWindow.maxTimestamp(); > } > > where intoWindow.maxTimestamp() is: > > /** Returns the largest timestamp that can be included in this window. */ > @Override > public Instant maxTimestamp() { > *// end not inclusive* > return *end.minus(1)*; > } > > Hence, the "44.*999*". > > And the "45.000" comes from the Python side when the combiner output results > as pre GBK operation: > [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889] > > if windows is 0: > self.output(_globally_windowed_value.with_value((key, value))) > else: > self.output(WindowedValue((key, value), *windows[0].end*, windows)) > > Here when we generate the window value, the timestamp is assigned to the > closed interval end (45.000) as opposed to open interval end (44.999) > > Clearly the "end of window" definition is a bit inconsistent across Python > and Java. I'm yet to try this on other run
[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
[ https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=296584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296584 ] ASF GitHub Bot logged work on BEAM-7995: Author: ASF GitHub Bot Created on: 16/Aug/19 21:13 Start Date: 16/Aug/19 21:13 Worklog Time Spent: 10m Work Description: lhaiesp commented on pull request #9364: BEAM-7995: python PGBKCVOperation using wrong timestamp URL: https://github.com/apache/beam/pull/9364 python PGBKCVOperation is using closed interval end which is not acceptable for window calculation. Make a change to use window.max_timestamp() which uses open interval end for the window Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
[ https://issues.apache.org/jira/browse/BEAM-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai Lu updated BEAM-7995: - Description: I'm looking into a bug I found internally when using Beam portable API (Python) on our own Samza runner. The pipeline looks something like this: (p | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent") | 'transform' >> beam.Map(lambda event: process_event(event)) | 'window' >> beam.WindowInto(FixedWindows(15)) | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())* ... The problem comes from the combiners which cause the following exception on Java side: Caused by: java.lang.IllegalStateException: TimestampCombiner moved element from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z) at org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) at org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) at org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) at org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) at org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) at org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136) The exception happens here [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116] when we check the shifted timestamp to ensure it's before the timestamp. if (shifted.isBefore(timestamp)) { throw new IllegalStateException( String.format( "TimestampCombiner moved element from %s to earlier time %s for window %s", BoundedWindow.formatTimestamp(timestamp), BoundedWindow.formatTimestamp(shifted), window)); } As you can see from the exception, the "shifted" is "XXX 44.999" while the "timestamp" is "XXX 45.000". The "44.999" is coming from [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]: @Override public Instant merge(BoundedWindow intoWindow, Iterable mergingTimestamps) { return intoWindow.maxTimestamp(); } where intoWindow.maxTimestamp() is: /** Returns the largest timestamp that can be included in this window. */ @Override public Instant maxTimestamp() { *// end not inclusive* return *end.minus(1)*; } Hence, the "44.*999*". And the "45.000" comes from the Python side when the combiner output results as pre GBK operation: [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889] if windows is 0: self.output(_globally_windowed_value.with_value((key, value))) else: self.output(WindowedValue((key, value), *windows[0].end*, windows)) Here when we generate the window value, the timestamp is assigned to the closed interval end (45.000) as opposed to open interval end (44.999) Clearly the "end of window" definition is a bit inconsistent across Python and Java. I'm yet to try this on other runner so not sure whether this is only an issue for our Samza runner. I tend to think this is a bug but would like to confirm with you. If this has not been an issue for other runners, where did I potentially do wrong. > IllegalStateException: TimestampCombiner moved element from to earlier time > in Python > - > > Key: BEAM-7995 > URL: https://issues.apache.org/jira/browse/BEAM-7995 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > > I'm looking into a bug I found internally when using Beam portable API > (Python) on our own Samza runner. > > The pipeline looks something like this: > > (p > | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent") > | 'transform' >> beam.Map(lambda event: process_event(event)) > | 'window' >> beam.WindowInto(FixedWindows(15)) > | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())* > ... > > The problem comes from the combiners which cause the following exception on > Java side: > > Caused by: java.lang.IllegalStateException: TimestampCombiner moved element > from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z > for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z) > at > org.apache.beam.runners.core.WatermarkHold.
[jira] [Created] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python
Hai Lu created BEAM-7995: Summary: IllegalStateException: TimestampCombiner moved element from to earlier time in Python Key: BEAM-7995 URL: https://issues.apache.org/jira/browse/BEAM-7995 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Hai Lu Assignee: Hai Lu -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel
[ https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=296566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296566 ] ASF GitHub Bot logged work on BEAM-7049: Author: ASF GitHub Bot Created on: 16/Aug/19 20:51 Start Date: 16/Aug/19 20:51 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9358: (WIP-BEAM-7049)Changes made to make a simple case of threeway union work URL: https://github.com/apache/beam/pull/9358#discussion_r314885625 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java ## @@ -88,15 +92,20 @@ public BeamSetOperatorRelBase(BeamRelNode beamRelNode, OpType opType, boolean al leftRows.apply( "CreateLeftIndex", MapElements.via(new BeamSetOperatorsTransforms.BeamSqlRow2KvFn( -.and( -rightTag, -rightRows.apply( -"CreateRightIndex", -MapElements.via(new BeamSetOperatorsTransforms.BeamSqlRow2KvFn( -.apply(CoGroupByKey.create()); +.and( Review comment: note that to make it general, it would be a for loop to create one tag per input. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296566) Time Spent: 20m (was: 10m) > Merge multiple input to one BeamUnionRel > > > Key: BEAM-7049 > URL: https://issues.apache.org/jira/browse/BEAM-7049 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: sridhar Reddy >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` > will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If > BeamUnionRel can handle multiple shuffles, we will have only one shuffle -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296561 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 20:40 Start Date: 16/Aug/19 20:40 Worklog Time Spent: 10m Work Description: matt-darwin commented on issue #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#issuecomment-522144491 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296561) Time Spent: 6h 40m (was: 6.5h) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs
[ https://issues.apache.org/jira/browse/BEAM-7989?focusedWorklogId=296560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296560 ] ASF GitHub Bot logged work on BEAM-7989: Author: ASF GitHub Bot Created on: 16/Aug/19 20:40 Start Date: 16/Aug/19 20:40 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #9357: [BEAM-7989] Remove side inputs from CacheVisitor calculation URL: https://github.com/apache/beam/pull/9357 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296560) Time Spent: 1h 10m (was: 1h) > SparkRunner CacheVisitor counts PCollections from SideInputs > > > Key: BEAM-7989 > URL: https://issues.apache.org/jira/browse/BEAM-7989 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.14.0 >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The SparkRunner's CacheVisitor looks at all inputs for a > TransformHierarchy.Node. Those inputs include the PCollections from the > PCollectionViews that are supplied as sideInputs. > The SparkRunner should not count these instances of sideInputs as the > PCollections are not actually accessed. They are only accessed when the > CreatePCollectionView Transform is processed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7989) SparkRunner CacheVisitor counts PCollections from SideInputs
[ https://issues.apache.org/jira/browse/BEAM-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-7989. Resolution: Fixed Fix Version/s: 2.16.0 > SparkRunner CacheVisitor counts PCollections from SideInputs > > > Key: BEAM-7989 > URL: https://issues.apache.org/jira/browse/BEAM-7989 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.14.0 >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The SparkRunner's CacheVisitor looks at all inputs for a > TransformHierarchy.Node. Those inputs include the PCollections from the > PCollectionViews that are supplied as sideInputs. > The SparkRunner should not count these instances of sideInputs as the > PCollections are not actually accessed. They are only accessed when the > CreatePCollectionView Transform is processed. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296559&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296559 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 20:39 Start Date: 16/Aug/19 20:39 Worklog Time Spent: 10m Work Description: matt-darwin commented on issue #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#issuecomment-522144229 > FYI, support for message_id and publish_time in Dataflow should be available in a future update of Dataflow. Until then, those fields will appear unset or blank. Is that external to the Beam project? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296559) Time Spent: 6.5h (was: 6h 20m) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7802) Expose a method to make an Schema coder from an Avro coder
[ https://issues.apache.org/jira/browse/BEAM-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-7802. Resolution: Fixed Fix Version/s: 2.16.0 > Expose a method to make an Schema coder from an Avro coder > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method to easily get a Schema Coder from > Avro objects or Schema and the user finally can set the Schema-based coder if > it needs the PCollection to behavie like a Schema-based one. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7802) Expose a method to make an Schema coder from an Avro coder
[ https://issues.apache.org/jira/browse/BEAM-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-7802: --- Description: Avro can infer the Schema for an Avro based PCollection by using the `withBeamSchemas` method, however if the user created a PCollection with Avro objects or IndexedRecord/GenericRecord, he needs to manually set the schema (or coder). The idea is to expose a method to easily get a Schema Coder from Avro objects or Schema and the user finally can set the Schema-based coder if it needs the PCollection to behavie like a Schema-based one. (was: Avro can infer the Schema for an Avro based PCollection by using the `withBeamSchemas` method, however if the user created a PCollection with Avro objects or IndexedRecord/GenericRecord, he needs to manually set the schema (or coder). The idea is to expose a method in schema.AvroUtils to ease this.) > Expose a method to make an Schema coder from an Avro coder > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method to easily get a Schema Coder from > Avro objects or Schema and the user finally can set the Schema-based coder if > it needs the PCollection to behavie like a Schema-based one. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7802) Expose a method to make an Schema coder from an Avro coder
[ https://issues.apache.org/jira/browse/BEAM-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-7802: --- Summary: Expose a method to make an Schema coder from an Avro coder (was: Expose a method to make an Avro-based PCollection into an Schema-based one) > Expose a method to make an Schema coder from an Avro coder > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one
[ https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296551 ] ASF GitHub Bot logged work on BEAM-7802: Author: ASF GitHub Bot Created on: 16/Aug/19 20:31 Start Date: 16/Aug/19 20:31 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #9130: [BEAM-7802] Expose a method to make an Avro-based PCollection into an Schema-based one URL: https://github.com/apache/beam/pull/9130 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296551) Time Spent: 6.5h (was: 6h 20m) > Expose a method to make an Avro-based PCollection into an Schema-based one > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one
[ https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296550&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296550 ] ASF GitHub Bot logged work on BEAM-7802: Author: ASF GitHub Bot Created on: 16/Aug/19 20:30 Start Date: 16/Aug/19 20:30 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a method to make an Avro-based PCollection into an Schema-based one URL: https://github.com/apache/beam/pull/9130#issuecomment-522141914 Merging since it is green now. Thanks for the review @kanterov and @RyanSkraba This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296550) Time Spent: 6h 20m (was: 6h 10m) > Expose a method to make an Avro-based PCollection into an Schema-based one > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one
[ https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296549 ] ASF GitHub Bot logged work on BEAM-7802: Author: ASF GitHub Bot Created on: 16/Aug/19 20:30 Start Date: 16/Aug/19 20:30 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a method to make an Avro-based PCollection into an Schema-based one URL: https://github.com/apache/beam/pull/9130#issuecomment-522124180 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296549) Time Spent: 6h 10m (was: 6h) > Expose a method to make an Avro-based PCollection into an Schema-based one > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296546 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 20:25 Start Date: 16/Aug/19 20:25 Worklog Time Spent: 10m Work Description: wcn3 commented on issue #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362#issuecomment-522140404 R: @lostluck @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296546) Time Spent: 20m (was: 10m) > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
[ https://issues.apache.org/jira/browse/BEAM-7994?focusedWorklogId=296544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296544 ] ASF GitHub Bot logged work on BEAM-7994: Author: ASF GitHub Bot Created on: 16/Aug/19 20:24 Start Date: 16/Aug/19 20:24 Worklog Time Spent: 10m Work Description: wcn3 commented on pull request #9362: [BEAM-7994] Fixing unsafe pointer usage for Go 1.13 URL: https://github.com/apache/beam/pull/9362 The Go team identified a problem in the Beam SDK that appears due to runtime changes in Go 1.13, which is upcoming. This is the backwards compatible fix the team recommended. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296544) Time Spent: 10m Remaining Estimate: 0h > BEAM SDK has compatibility problems with go1.13 > --- > > Key: BEAM-7994 > URL: https://issues.apache.org/jira/browse/BEAM-7994 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Bill Neubauer >Assignee: Bill Neubauer >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > The Go team identified a problem in the Beam SDK that appears due to runtime > changes in Go 1.13, which is upcoming. There is a backwards compatible fix > the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7994) BEAM SDK has compatibility problems with go1.13
Bill Neubauer created BEAM-7994: --- Summary: BEAM SDK has compatibility problems with go1.13 Key: BEAM-7994 URL: https://issues.apache.org/jira/browse/BEAM-7994 Project: Beam Issue Type: Improvement Components: sdk-go Reporter: Bill Neubauer Assignee: Bill Neubauer The Go team identified a problem in the Beam SDK that appears due to runtime changes in Go 1.13, which is upcoming. There is a backwards compatible fix the team recommended. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296536 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 20:17 Start Date: 16/Aug/19 20:17 Worklog Time Spent: 10m Work Description: udim commented on issue #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#issuecomment-522138174 FYI, support for message_id and publish_time in Dataflow should be available in a future update of Dataflow. Until then, those fields will appear unset or blank. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296536) Time Spent: 6h 20m (was: 6h 10m) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7993) portable python precommit is flaky
[ https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909362#comment-16909362 ] Udi Meiri commented on BEAM-7993: - Similar failures: https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5505/consoleFull https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5508/console [~tvalentyn], [~robertwb]: Could either of you forward this to someone for triage? > portable python precommit is flaky > -- > > Key: BEAM-7993 > URL: https://issues.apache.org/jira/browse/BEAM-7993 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures, testing >Reporter: Udi Meiri >Priority: Major > > I'm not sure what the root cause is here. > Example log where > :sdks:python:test-suites:portable:py35:portableWordCountBatch failed: > {code} > 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap > (FlatMap at ExtractOutput[0]) (2/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at > ExtractOutput[0]) (2/2) > 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap > (FlatMap at ExtractOutput[0]) (1/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at > ExtractOutput[0]) (1/2) > 11:51:22 [CHAIN MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2) > 11:51:22 [CHAIN MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2) > 11:51:22 java.lang.Exception: The user defined 'open()' method caused an > exception: java.io.IOException: Received exit code 1 for command 'docker > inspect -f {{.State.Running}} > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: > Error: No such object: > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > 11:51:22 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) > 11:51:22 at java.lang.Thread.run(Thread.java:748) > 11:51:22 Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.io.IOException: Received exit code 1 for command 'docker inspect -f > {{.State.Running}} > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: > Error: No such object: > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 > 11:51:22 at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) > 11:51:22 at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) > 11:51:22 ... 3 more > {code} > https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7993) portable python precommit is flaky
Udi Meiri created BEAM-7993: --- Summary: portable python precommit is flaky Key: BEAM-7993 URL: https://issues.apache.org/jira/browse/BEAM-7993 Project: Beam Issue Type: Bug Components: sdk-py-core, test-failures, testing Reporter: Udi Meiri I'm not sure what the root cause is here. Example log where :sdks:python:test-suites:portable:py35:portableWordCountBatch failed: {code} 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2) 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2) 11:51:22 [CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2) 11:51:22 [CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2) 11:51:22 java.lang.Exception: The user defined 'open()' method caused an exception: java.io.IOException: Received exit code 1 for command 'docker inspect -f {{.State.Running}} 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: Error: No such object: 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 11:51:22at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) 11:51:22at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) 11:51:22at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) 11:51:22at java.lang.Thread.run(Thread.java:748) 11:51:22 Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.io.IOException: Received exit code 1 for command 'docker inspect -f {{.State.Running}} 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: Error: No such object: 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 11:51:22at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) 11:51:22at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211) 11:51:22at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202) 11:51:22at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) 11:51:22at org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) 11:51:22at org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) 11:51:22at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) 11:51:22at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) 11:51:22at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) 11:51:22... 3 more {code} https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296532 ] ASF GitHub Bot logged work on BEAM-5519: Author: ASF GitHub Bot Created on: 16/Aug/19 19:56 Start Date: 16/Aug/19 19:56 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] Remove call to groupByKey in Spark Streaming. URL: https://github.com/apache/beam/pull/6511#issuecomment-522132527 One other change that we can think about making now that we no longer need to preserve the partitioner. Just to clean things up and eliminate a possible source of confusion: ``` @@ -58,14 +58,9 @@ public class GroupCombineFunctions { JavaPairRDD> groupedRDD = (partitioner != null) ? pairRDD.groupByKey(partitioner) : pairRDD.groupByKey(); -// using mapPartitions allows to preserve the partitioner -// and avoid unnecessary shuffle downstream. return groupedRDD -.mapPartitionsToPair( -TranslationUtils.pairFunctionToPairFlatMapFunction( -CoderHelpers.fromByteFunctionIterable(keyCoder, wvCoder)), -true) -.mapPartitions(TranslationUtils.fromPairFlatMapFunction(), true); +.mapToPair(CoderHelpers.fromByteFunctionIterable(keyCoder, wvCoder)) +.map(new TranslationUtils.FromPairFunction<>()); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296532) Time Spent: 6h 10m (was: 6h) > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming > Fix For: 2.16.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296530 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 16/Aug/19 19:52 Start Date: 16/Aug/19 19:52 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522131428 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296530) Time Spent: 11h (was: 10h 50m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=296529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296529 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 16/Aug/19 19:50 Start Date: 16/Aug/19 19:50 Worklog Time Spent: 10m Work Description: udim commented on issue #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#issuecomment-522130866 run python 2 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296529) Time Spent: 10h 50m (was: 10h 40m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245&view=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7938) ProtoCoder throws NoSuchMethodException: com.google.protobuf.Message.getDefaultInstance()
[ https://issues.apache.org/jira/browse/BEAM-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909352#comment-16909352 ] Martin Fahy commented on BEAM-7938: --- Hi [~paliendroom] is there a way you could provide an example snippet of code which reproduces the issue. Thank you! > ProtoCoder throws NoSuchMethodException: > com.google.protobuf.Message.getDefaultInstance() > - > > Key: BEAM-7938 > URL: https://issues.apache.org/jira/browse/BEAM-7938 > Project: Beam > Issue Type: Bug > Components: extensions-java-protobuf, io-java-files, io-java-gcp, > io-java-parquet >Affects Versions: 2.14.0 >Reporter: Lien Michiels >Priority: Major > Labels: newbie, starter > > h3. Context > I have a beam pipeline running on DataFlow using the Java SDK that pulls > Proto wrapper messages from a PubSub subscription, I partition these by the > OneOf-value and then apply a MapElements to extract the underlying Proto > message, so that I end up with a PCollectionList. I then > do some more processing and try to write them to different sinks. BigQueryIO > works absolutely fine. However when I try to use the PubsubIO or ParquetIO, I > end up with this error when using FileIO (for Parquet): > > {code:java} > java.lang.IllegalArgumentException: java.lang.NoSuchMethodException: > com.google.protobuf.Message.getDefaultInstance() > org.apache.beam.sdk.extensions.protobuf.ProtoCoder.getParser(ProtoCoder.java:288) > > org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:192) > > org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:108) > > org.apache.beam.runners.dataflow.worker.WindmillKeyedWorkItem.lambda$elementsIterable$2(WindmillKeyedWorkItem.java:107) > > org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterators$7.transform(Iterators.java:750) > > org.apache.beam.vendor.guava.v20_0.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47) > java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) > > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) > > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) > > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) > > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner$LateDataFilter.filter(LateDataDroppingDoFnRunner.java:128) > > org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:76) > > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134) > > org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44) > > org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49) > > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201) > > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1287) > > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024) > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > java.base/java.lang.Thread.run(Thread.java:834) > {code} > > and this for PubsubIO: > > {code:java} > java.lang.IllegalArgumentException: java.lang.NoSuchMethodException: > com.google.protobuf.Message.getDefaultInstance() > org.apache.beam.sdk.extensions.protobuf.ProtoCoder.getParser(ProtoCoder.java:288) > > org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:192) > > org.apache.beam.sdk.extensions.protobuf.ProtoCoder.decode(ProtoCoder.java:108) > > org.apache.beam.runners.dataflow.worker.WindmillKeyedWorkItem.lambda$elementsIterable$2(WindmillKeyed
[jira] [Commented] (BEAM-7164) Python precommit failing on Java PRs. dataflow:setupVirtualenv
[ https://issues.apache.org/jira/browse/BEAM-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909353#comment-16909353 ] Udi Meiri commented on BEAM-7164: - Seems to not reproduce when adding --no-parallel to the setupVirtualenv tasks. I guess the issue is with overloading files.pythonhosted.org with requests. > Python precommit failing on Java PRs. dataflow:setupVirtualenv > --- > > Key: BEAM-7164 > URL: https://issues.apache.org/jira/browse/BEAM-7164 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Alex Amato >Priority: Major > > *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull* > *18:05:44* > > *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv* > *18:05:44* New python executable in > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44* > Also creating executable in > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44* > Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* Running > virtualenv with interpreter /usr/bin/python2.7*18:05:44* DEPRECATION: Python > 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your > Python as Python 2.7 won't be maintained after that date. A future version of > pip will drop support for Python 2.7.*18:05:44* Collecting > tox==3.0.0*18:05:44* Using cached > [https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl] > *18:05:44* Collecting grpcio-tools==1.3.5*18:05:44* Using cached > [https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl] > *18:05:44* Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44* Using > cached > [https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl] > *18:05:44* Collecting six (from tox==3.0.0)*18:05:44* Using cached > [https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl] > *18:05:44* Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44* Using > cached > [https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl] > *18:05:44* Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44* Using cached > [https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl] > *18:05:44* Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44* > Using cached > [https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl] > *18:05:44* Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44* > Using cached > [https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl] > *18:05:44* Collecting futures>=2.2.0; python_version < "3.2" (from > grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44* ERROR: Could not find a > version that satisfies the requirement futures>=2.2.0; python_version < "3.2" > (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)*18:05:44* > ERROR: No matching distribution found for futures>=2.2.0; python_version < > "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* > > *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv* > FAILED*18:05:46* > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one
[ https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296516 ] ASF GitHub Bot logged work on BEAM-7802: Author: ASF GitHub Bot Created on: 16/Aug/19 19:26 Start Date: 16/Aug/19 19:26 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a method to make an Avro-based PCollection into an Schema-based one URL: https://github.com/apache/beam/pull/9130#issuecomment-522124180 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296516) Time Spent: 5h 50m (was: 5h 40m) > Expose a method to make an Avro-based PCollection into an Schema-based one > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 5h 50m > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one
[ https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296515 ] ASF GitHub Bot logged work on BEAM-7802: Author: ASF GitHub Bot Created on: 16/Aug/19 19:26 Start Date: 16/Aug/19 19:26 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a method to make an Avro-based PCollection into an Schema-based one URL: https://github.com/apache/beam/pull/9130#issuecomment-522124042 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296515) Time Spent: 5h 40m (was: 5.5h) > Expose a method to make an Avro-based PCollection into an Schema-based one > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 5h 40m > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one
[ https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=296517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296517 ] ASF GitHub Bot logged work on BEAM-7802: Author: ASF GitHub Bot Created on: 16/Aug/19 19:26 Start Date: 16/Aug/19 19:26 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9130: [BEAM-7802] Expose a method to make an Avro-based PCollection into an Schema-based one URL: https://github.com/apache/beam/pull/9130#issuecomment-522124042 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296517) Time Spent: 6h (was: 5h 50m) > Expose a method to make an Avro-based PCollection into an Schema-based one > -- > > Key: BEAM-7802 > URL: https://issues.apache.org/jira/browse/BEAM-7802 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > > Avro can infer the Schema for an Avro based PCollection by using the > `withBeamSchemas` method, however if the user created a PCollection with Avro > objects or IndexedRecord/GenericRecord, he needs to manually set the schema > (or coder). The idea is to expose a method in schema.AvroUtils to ease this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-1009) Upgrade from mockito-all 1 to mockito-core 2
[ https://issues.apache.org/jira/browse/BEAM-1009?focusedWorklogId=296505&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296505 ] ASF GitHub Bot logged work on BEAM-1009: Author: ASF GitHub Bot Created on: 16/Aug/19 19:02 Start Date: 16/Aug/19 19:02 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #9145: [BEAM-1009] Update multiple modules to use Mockito 2 URL: https://github.com/apache/beam/pull/9145 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296505) Time Spent: 2h 10m (was: 2h) > Upgrade from mockito-all 1 to mockito-core 2 > > > Key: BEAM-1009 > URL: https://issues.apache.org/jira/browse/BEAM-1009 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Pei He >Assignee: Ismaël Mejía >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Mockito 2 provides useful features, and the mockito-all module is no longer > generated. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7410) Explore possibilities to lower in-use IP address quota footprint.
[ https://issues.apache.org/jira/browse/BEAM-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909323#comment-16909323 ] Alan Myrvold commented on BEAM-7410: https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/ is passing https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/ is passing Removed the currently-failing label > Explore possibilities to lower in-use IP address quota footprint. > - > > Key: BEAM-7410 > URL: https://issues.apache.org/jira/browse/BEAM-7410 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Alan Myrvold >Priority: Major > > > {noformat} > java.lang.RuntimeException: Workflow failed. Causes: Project > apache-beam-testing has insufficient quota(s) to execute this workflow with 1 > instances in region us-central1. Quota summary (required/available): 1/11743 > instances, 1/170 CPUs, 250/278399 disk GB, 0/4046 SSD disk GB, 1/188 instance > groups, 1/188 managed instance groups, 1/126 instance templates, 1/0 in-use > IP addresses.{noformat} > > Sample jobs: > [https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/440/testReport/junit/org.apache.beam.sdk.testing/PAssertTest/testGlobalWindowContainsInAnyOrder/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/67/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testEmptyIterableSideInput/] > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296496&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296496 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 18:31 Start Date: 16/Aug/19 18:31 Worklog Time Spent: 10m Work Description: matt-darwin commented on pull request #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#discussion_r314840925 ## File path: sdks/python/apache_beam/io/gcp/pubsub.py ## @@ -61,29 +61,43 @@ class PubsubMessage(object): attributes: (dict) Key-value map of str to str, containing both user-defined and service generated attributes (such as id_label and timestamp_attribute). May be None. +message_id: (string) Message_Id as generated by PubSub +publish_time: (timestamp) Published time of message as generated by PubSub + as per google.protobuf.timestamp_pb2.Timestamp + +N.B. message_id and publish_time are not yet populated by Windmill service Review comment: Ah ok. WIll update. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296496) Time Spent: 6h 10m (was: 6h) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296495 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 18:30 Start Date: 16/Aug/19 18:30 Worklog Time Spent: 10m Work Description: matt-darwin commented on pull request #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#discussion_r314840820 ## File path: sdks/python/apache_beam/io/gcp/pubsub_test.py ## @@ -82,25 +83,37 @@ def test_proto_conversion(self): self.assertEqual(m_converted.attributes, attributes) def test_eq(self): -a = PubsubMessage(b'abc', {1: 2, 3: 4}) -b = PubsubMessage(b'abc', {1: 2, 3: 4}) -c = PubsubMessage(b'abc', {1: 2}) +publish_time_secs = 1520861821 Review comment: Yes no problem; I'd copied and pasted this from another test where the seconds and nanos were used seperately; but it's not needed here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296495) Time Spent: 6h (was: 5h 50m) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7410) Explore possibilities to lower in-use IP address quota footprint.
[ https://issues.apache.org/jira/browse/BEAM-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Myrvold updated BEAM-7410: --- Labels: (was: currently-failing) > Explore possibilities to lower in-use IP address quota footprint. > - > > Key: BEAM-7410 > URL: https://issues.apache.org/jira/browse/BEAM-7410 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Alan Myrvold >Priority: Major > > > {noformat} > java.lang.RuntimeException: Workflow failed. Causes: Project > apache-beam-testing has insufficient quota(s) to execute this workflow with 1 > instances in region us-central1. Quota summary (required/available): 1/11743 > instances, 1/170 CPUs, 250/278399 disk GB, 0/4046 SSD disk GB, 1/188 instance > groups, 1/188 managed instance groups, 1/126 instance templates, 1/0 in-use > IP addresses.{noformat} > > Sample jobs: > [https://builds.apache.org/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/440/testReport/junit/org.apache.beam.sdk.testing/PAssertTest/testGlobalWindowContainsInAnyOrder/] > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/67/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testEmptyIterableSideInput/] > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296494 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 18:30 Start Date: 16/Aug/19 18:30 Worklog Time Spent: 10m Work Description: matt-darwin commented on pull request #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#discussion_r314840661 ## File path: sdks/python/apache_beam/io/gcp/pubsub_test.py ## @@ -82,25 +83,37 @@ def test_proto_conversion(self): self.assertEqual(m_converted.attributes, attributes) def test_eq(self): -a = PubsubMessage(b'abc', {1: 2, 3: 4}) -b = PubsubMessage(b'abc', {1: 2, 3: 4}) -c = PubsubMessage(b'abc', {1: 2}) +publish_time_secs = 1520861821 +publish_time_nanos = 234567000 +publish_time = Timestamp(seconds=publish_time_secs, + nanos=publish_time_nanos) +a = PubsubMessage(b'abc', {1: 2, 3: 4}, '1234567890', publish_time) +b = PubsubMessage(b'abc', {1: 2, 3: 4}, '1234567890', publish_time) +c = PubsubMessage(b'abc', {1: 2}, '1234567890', publish_time) Review comment: Yes, I will add to the hash and repr tests as well; a check to ensure that both a and b do not equal d/e with some differing message_id/publish_time data). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296494) Time Spent: 5h 50m (was: 5h 40m) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7992) Unhandled type_constraint in apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
Udi Meiri created BEAM-7992: --- Summary: Unhandled type_constraint in apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types Key: BEAM-7992 URL: https://issues.apache.org/jira/browse/BEAM-7992 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri {code} root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Union[] root: DEBUG: Unhandled type_constraint: Any root: DEBUG: Unhandled type_constraint: Any {code} https://builds.apache.org/job/beam_PostCommit_Python37_PR/20/testReport/junit/apache_beam.io.gcp.bigquery_write_it_test/BigQueryWriteIntegrationTests/test_big_query_write_new_types/ These log entries are from opcode.py's _unpack_lists. They might be pointing to a bug or missing feature. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296491&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296491 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 18:25 Start Date: 16/Aug/19 18:25 Worklog Time Spent: 10m Work Description: matt-darwin commented on pull request #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#discussion_r314838660 ## File path: sdks/python/scripts/run_pylint.sh ## @@ -104,6 +104,7 @@ ISORT_EXCLUDED=( "model.py" "taxi.py" "process_tfma.py" + "pubsub_test.py" Review comment: It was insisting I place the protobuf import prior to the beam imports, but then if I did that, complaining about the grouping with the pubsub import. However, as mentioned above, I didn't realise you could nest imports in a try/catch, so this should resolve this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296491) Time Spent: 5h 40m (was: 5.5h) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296493 ] ASF GitHub Bot logged work on BEAM-5519: Author: ASF GitHub Bot Created on: 16/Aug/19 18:26 Start Date: 16/Aug/19 18:26 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] Remove call to groupByKey in Spark Streaming. URL: https://github.com/apache/beam/pull/6511#issuecomment-522106601 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296493) Time Spent: 6h (was: 5h 50m) > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming > Fix For: 2.16.0 > > Time Spent: 6h > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296492 ] ASF GitHub Bot logged work on BEAM-5519: Author: ASF GitHub Bot Created on: 16/Aug/19 18:26 Start Date: 16/Aug/19 18:26 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] Remove call to groupByKey in Spark Streaming. URL: https://github.com/apache/beam/pull/6511#issuecomment-522106526 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296492) Time Spent: 5h 50m (was: 5h 40m) > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming > Fix For: 2.16.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields
[ https://issues.apache.org/jira/browse/BEAM-7819?focusedWorklogId=296490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296490 ] ASF GitHub Bot logged work on BEAM-7819: Author: ASF GitHub Bot Created on: 16/Aug/19 18:24 Start Date: 16/Aug/19 18:24 Worklog Time Spent: 10m Work Description: matt-darwin commented on pull request #9232: [BEAM-7819] Python - parse PubSub message_id into attributes property URL: https://github.com/apache/beam/pull/9232#discussion_r314838329 ## File path: sdks/python/apache_beam/io/gcp/tests/pubsub_matcher_test.py ## @@ -37,8 +37,15 @@ except ImportError: pubsub = None +try: + from google.protobuf.timestamp_pb2 import Timestamp Review comment: Ah I didn't realise you could have imports in the same try statement. Fixing this up, which should also sort out the isort issues below (will handle it in the same way on pubsub_tests.py) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296490) Time Spent: 5.5h (was: 5h 20m) > PubsubMessage message parsing is lacking non-attribute fields > - > > Key: BEAM-7819 > URL: https://issues.apache.org/jira/browse/BEAM-7819 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > User reported issue: > https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E > """ > Looking at the source code, with my untrained python eyes, I think if the > intention is to include the message id and the publish time in the attributes > attribute of the PubSubMessage type, then the protobuf mapping is missing > something:- > @staticmethod > def _from_proto_str(proto_msg): > """Construct from serialized form of ``PubsubMessage``. > Args: > proto_msg: String containing a serialized protobuf of type > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > Returns: > A new PubsubMessage object. > """ > msg = pubsub.types.pubsub_pb2.PubsubMessage() > msg.ParseFromString(proto_msg) > # Convert ScalarMapContainer to dict. > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > return PubsubMessage(msg.data, attributes) > The protobuf definition is here:- > https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage > and so it looks as if the message_id and publish_time are not being parsed as > they are seperate from the attributes. Perhaps the PubsubMessage class needs > expanding to include these as attributes, or they would need adding to the > dictionary for attributes. This would only need doing for the _from_proto_str > as obviously they would not need to be populated when transmitting a message > to PubSub. > My python is not great, I'm assuming the latter option would need to look > something like this? > attributes = dict((key, msg.attributes[key]) for key in msg.attributes) > attributes.update({'message_id': msg.message_id, 'publish_time': > msg.publish_time}) > return PubsubMessage(msg.data, attributes) > """ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=296484&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296484 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 16/Aug/19 18:13 Start Date: 16/Aug/19 18:13 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #9351: [BEAM-7909] support customized container for Python URL: https://github.com/apache/beam/pull/9351#discussion_r314834051 ## File path: sdks/python/container/base_image_requirements.txt ## @@ -67,7 +67,7 @@ pandas==0.23.4 protorpc==0.11.1 python-gflags==3.0.6 setuptools<=39.1.0 # requirement for Tensorflow. -tensorflow==1.11.0 +tensorflow==1.13.1 Review comment: `pyyaml` and `tensorflow` version upgrade is because of py3.7. Previous versions don't work for py3.7. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296484) Time Spent: 1h 50m (was: 1h 40m) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues
[ https://issues.apache.org/jira/browse/BEAM-7478?focusedWorklogId=296482&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296482 ] ASF GitHub Bot logged work on BEAM-7478: Author: ASF GitHub Bot Created on: 16/Aug/19 18:07 Start Date: 16/Aug/19 18:07 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #8775: [BEAM-7478] Detect class path from the "java.class.path" property URL: https://github.com/apache/beam/pull/8775#issuecomment-522100855 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296482) Time Spent: 1h 50m (was: 1h 40m) > Remote cluster submission from Flink Runner broken due to staging issues > > > Key: BEAM-7478 > URL: https://issues.apache.org/jira/browse/BEAM-7478 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Attachments: image-2019-08-07-15-36-35-001.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The usual way to submit pipelines with the Flink Runner is to build a fat jar > and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This > works fine. > Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option > to specify a remote cluster. Upon submitting an example we get the following > at Flink's JobManager. > {noformat} > Caused by: java.lang.IllegalAccessError: class > sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its > superclass sun.reflect.SerializationConstructorAccessorImpl > at sun.misc.Unsafe.defineClass(Native Method) > at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393) > at > sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112) > at > sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340) > at > java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420) > at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477) > at > org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438) > at > org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288) > at > org.apache.flink.runtime.jobgraph.InputFormatVertex.initia
[jira] [Work logged] (BEAM-7478) Remote cluster submission from Flink Runner broken due to staging issues
[ https://issues.apache.org/jira/browse/BEAM-7478?focusedWorklogId=296483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296483 ] ASF GitHub Bot logged work on BEAM-7478: Author: ASF GitHub Bot Created on: 16/Aug/19 18:07 Start Date: 16/Aug/19 18:07 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #8775: [BEAM-7478] Detect class path from the "java.class.path" property URL: https://github.com/apache/beam/pull/8775 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296483) Time Spent: 2h (was: 1h 50m) > Remote cluster submission from Flink Runner broken due to staging issues > > > Key: BEAM-7478 > URL: https://issues.apache.org/jira/browse/BEAM-7478 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Attachments: image-2019-08-07-15-36-35-001.png > > Time Spent: 2h > Remaining Estimate: 0h > > The usual way to submit pipelines with the Flink Runner is to build a fat jar > and use the {{bin/flink}} utility to submit the jar to a Flink cluster. This > works fine. > Alternatively, the Flink Runner can use the {{flinkMaster}} pipeline option > to specify a remote cluster. Upon submitting an example we get the following > at Flink's JobManager. > {noformat} > Caused by: java.lang.IllegalAccessError: class > sun.reflect.GeneratedSerializationConstructorAccessor70 cannot access its > superclass sun.reflect.SerializationConstructorAccessorImpl > at sun.misc.Unsafe.defineClass(Native Method) > at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399) > at > sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393) > at > sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:112) > at > sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:340) > at > java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1420) > at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:497) > at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) > at java.security.AccessController.doPrivileged(Native Method) > at java.io.ObjectStreamClass.(ObjectStreamClass.java:472) > at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477) > at > org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438) > at > org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:288) > at > org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:63) > ... 32 more > {noformat} > It appears there is an issue with the staging via {{PipelineResources}}. -- This message was sent by Atlassian JIR
[jira] [Work logged] (BEAM-5519) Spark Streaming Duplicated Encoding/Decoding Effort
[ https://issues.apache.org/jira/browse/BEAM-5519?focusedWorklogId=296477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296477 ] ASF GitHub Bot logged work on BEAM-5519: Author: ASF GitHub Bot Created on: 16/Aug/19 17:57 Start Date: 16/Aug/19 17:57 Worklog Time Spent: 10m Work Description: kyle-winkelman commented on issue #6511: [BEAM-5519] Remove call to groupByKey in Spark Streaming. URL: https://github.com/apache/beam/pull/6511#issuecomment-522097498 @iemejia I cleaned this up. Mainly just threw out the part where I was getting rid of the partitioner in batch mode, because [BEAM-7413](https://issues.apache.org/jira/browse/BEAM-7413) made me realize that approach would cause other issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296477) Time Spent: 5h 40m (was: 5.5h) > Spark Streaming Duplicated Encoding/Decoding Effort > --- > > Key: BEAM-5519 > URL: https://issues.apache.org/jira/browse/BEAM-5519 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Winkelman >Assignee: Kyle Winkelman >Priority: Major > Labels: spark, spark-streaming > Fix For: 2.16.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > When using the SparkRunner in streaming mode. There is a call to groupByKey > followed by a call to updateStateByKey. BEAM-1815 fixed an issue where this > used to cause 2 shuffles but it still causes 2 encode/decode cycles. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7164) Python precommit failing on Java PRs. dataflow:setupVirtualenv
[ https://issues.apache.org/jira/browse/BEAM-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909263#comment-16909263 ] Udi Meiri commented on BEAM-7164: - Reproducible locally via: {code} for i in `seq 100`; do ../../gradlew clean --no-parallel && ../../gradlew setupVirtualenv || break; done {code} > Python precommit failing on Java PRs. dataflow:setupVirtualenv > --- > > Key: BEAM-7164 > URL: https://issues.apache.org/jira/browse/BEAM-7164 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Alex Amato >Priority: Major > > *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull* > *18:05:44* > > *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv* > *18:05:44* New python executable in > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44* > Also creating executable in > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44* > Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* Running > virtualenv with interpreter /usr/bin/python2.7*18:05:44* DEPRECATION: Python > 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your > Python as Python 2.7 won't be maintained after that date. A future version of > pip will drop support for Python 2.7.*18:05:44* Collecting > tox==3.0.0*18:05:44* Using cached > [https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl] > *18:05:44* Collecting grpcio-tools==1.3.5*18:05:44* Using cached > [https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl] > *18:05:44* Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44* Using > cached > [https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl] > *18:05:44* Collecting six (from tox==3.0.0)*18:05:44* Using cached > [https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl] > *18:05:44* Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44* Using > cached > [https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl] > *18:05:44* Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44* Using cached > [https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl] > *18:05:44* Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44* > Using cached > [https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl] > *18:05:44* Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44* > Using cached > [https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl] > *18:05:44* Collecting futures>=2.2.0; python_version < "3.2" (from > grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44* ERROR: Could not find a > version that satisfies the requirement futures>=2.2.0; python_version < "3.2" > (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)*18:05:44* > ERROR: No matching distribution found for futures>=2.2.0; python_version < > "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* > > *Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv* > FAILED*18:05:46* > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (BEAM-7987) WindowingWindmillReader should not start to read from an empty windmill workitem
[ https://issues.apache.org/jira/browse/BEAM-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang closed BEAM-7987. -- Resolution: Fixed Fix Version/s: 2.16.0 > WindowingWindmillReader should not start to read from an empty windmill > workitem > > > Key: BEAM-7987 > URL: https://issues.apache.org/jira/browse/BEAM-7987 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > WindowingWindmillReader should expect that a windmill workitem has either > timers or elements. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7986) Increase minimum grpcio required version
[ https://issues.apache.org/jira/browse/BEAM-7986?focusedWorklogId=296458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296458 ] ASF GitHub Bot logged work on BEAM-7986: Author: ASF GitHub Bot Created on: 16/Aug/19 17:43 Start Date: 16/Aug/19 17:43 Worklog Time Spent: 10m Work Description: udim commented on issue #9356: [BEAM-7986] Upgrade grpcio URL: https://github.com/apache/beam/pull/9356#issuecomment-522093105 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296458) Time Spent: 0.5h (was: 20m) > Increase minimum grpcio required version > > > Key: BEAM-7986 > URL: https://issues.apache.org/jira/browse/BEAM-7986 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > According to this question, 1.11.0 is not new enough (1.22.0 reportedly > works), and we list the minimum as 1.8. > https://stackoverflow.com/questions/57479498/beam-channel-object-has-no-attribute-close?noredirect=1#comment101446049_57479498 > Affects DirectRunner Pub/Sub client. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=296457&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296457 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 16/Aug/19 17:42 Start Date: 16/Aug/19 17:42 Worklog Time Spent: 10m Work Description: zfraa commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r314820458 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/HllCountTest.java ## @@ -0,0 +1,373 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.zetasketch; + +import com.google.zetasketch.HyperLogLogPlusPlus; +import com.google.zetasketch.shaded.com.google.protobuf.ByteString; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.beam.sdk.Pipeline.PipelineExecutionException; +import org.apache.beam.sdk.testing.NeedsRunner; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TypeDescriptor; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import org.junit.rules.ExpectedException; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link HllCount}. */ +@RunWith(JUnit4.class) +public class HllCountTest { + + @Rule public final transient TestPipeline p = TestPipeline.create(); + @Rule public transient ExpectedException thrown = ExpectedException.none(); + + // Integer + private static final List INTS1 = Arrays.asList(1, 2, 3, 3, 1, 4); + private static final byte[] INTS1_SKETCH; + private static final Long INTS1_ESTIMATE; + + static { +HyperLogLogPlusPlus hll = new HyperLogLogPlusPlus.Builder().buildForIntegers(); +INTS1.forEach(hll::add); +INTS1_SKETCH = hll.serializeToByteArray(); +INTS1_ESTIMATE = hll.longResult(); + } + + private static final List INTS2 = Arrays.asList(3, 3, 3, 3); + private static final byte[] INTS2_SKETCH; + private static final Long INTS2_ESTIMATE; + + static { +HyperLogLogPlusPlus hll = new HyperLogLogPlusPlus.Builder().buildForIntegers(); +INTS2.forEach(hll::add); +INTS2_SKETCH = hll.serializeToByteArray(); +INTS2_ESTIMATE = hll.longResult(); + } + + private static final byte[] INTS1_INTS2_SKETCH; + + static { +HyperLogLogPlusPlus hll = HyperLogLogPlusPlus.forProto(INTS1_SKETCH); +hll.merge(INTS2_SKETCH); +INTS1_INTS2_SKETCH = hll.serializeToByteArray(); + } + + // Long + private static final List LONGS = Collections.singletonList(1L); + private static final byte[] LONGS_SKETCH; + + static { +HyperLogLogPlusPlus hll = new HyperLogLogPlusPlus.Builder().buildForLongs(); +LONGS.forEach(hll::add); +LONGS_SKETCH = hll.serializeToByteArray(); + } + + private static final byte[] LONGS_EMPTY_SKETCH; + + static { +HyperLogLogPlusPlus hll = new HyperLogLogPlusPlus.Builder().buildForLongs(); +LONGS_EMPTY_SKETCH = hll.serializeToByteArray(); + } + + // String + private static final List STRINGS = Arrays.asList("s1", "s2", "s1", "s2"); + private static final byte[] STRINGS_SKETCH; + + static { +HyperLogLogPlusPlus hll = new HyperLogLogPlusPlus.Builder().buildForStrings(); +STRINGS.forEach(hll::add); +STRINGS_SKETCH = hll.serializeToByteArray(); + } + + private static final int TEST_PRECISION = 20; + private static final byte[] STRINGS_SKETCH_TEST_PRECISION; + + static { +HyperLogLogPlusPlus hll = +new HyperLogLogPlusPlus.Builder().normalPrecision(TEST_PRECISION).buildForStrings(); +STRINGS.forEach(hll::add); +STRINGS_SKETCH_TEST_PRECISION = hll.serializeToByteArray(); + } + + // Bytes + private static final byte[] BYTES0 = {(byte) 0x1, (byte)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=296455&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296455 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 16/Aug/19 17:42 Start Date: 16/Aug/19 17:42 Worklog Time Spent: 10m Work Description: zfraa commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r314795281 ## File path: sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/HllCountInitFn.java ## @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.zetasketch; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.zetasketch.HyperLogLogPlusPlus; +import com.google.zetasketch.shaded.com.google.protobuf.ByteString; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.coders.CoderRegistry; +import org.apache.beam.sdk.transforms.Combine; + +/** + * {@link Combine.CombineFn} for the {@link HllCount.Init} combiner. + * + * @param type of input values to the function (Integer, Long, String, or byte[]) + * @param type of the HLL++ sketch to compute (Integer, Long, String, or ByteString) + */ +abstract class HllCountInitFn +extends Combine.CombineFn, byte[]> { + + // Would make this a final field. However, that not only requires adding an extra type enum to Review comment: Nit: I would change the first sentence to "Ideally, this would be a final field set at construction time via the builder." just to be super-clear what is meant. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 296455) Time Spent: 20h 20m (was: 20h 10m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 20h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)