[GitHub] [beam] robinyqiu commented on pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType
robinyqiu commented on pull request #11272: URL: https://github.com/apache/beam/pull/11272#issuecomment-624452448 Could you help trigger the tests again? For the comment on range: Thanks for pointing it out. I overlooked this problem. I would like to create a separate PR to address it, along with range testing for other types as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu commented on a change in pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType
robinyqiu commented on a change in pull request #11272: URL: https://github.com/apache/beam/pull/11272#discussion_r418871819 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java ## @@ -427,17 +430,12 @@ private static Expression value( private static Expression value(Expression value, Schema.FieldType type) { if (type.getTypeName().isLogicalType()) { -Expression millisField = Expressions.call(value, "getMillis"); String logicalId = type.getLogicalType().getIdentifier(); if (logicalId.equals(TimeType.IDENTIFIER)) { - return nullOr(value, Expressions.convert_(millisField, int.class)); -} else if (logicalId.equals(DateType.IDENTIFIER)) { - value = - nullOr( - value, - Expressions.convert_( - Expressions.divide(millisField, Expressions.constant(MILLIS_PER_DAY)), - int.class)); + return nullOr( + value, Expressions.convert_(Expressions.call(value, "getMillis"), int.class)); +} else if (logicalId.equals(SqlTypes.DATE.getIdentifier())) { Review comment: Done. I hope I could use a switch statement here, but unfortunately there is no constant `IDENTIFIER` defined in the `LogicalType` class. (I could add it to each concrete SQL logical type I create, but I don't think that is a good style.) ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Date.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.logicaltypes; + +import java.time.LocalDate; +import org.apache.beam.sdk.schemas.Schema; + +/** + * A date without a time-zone. + * + * It cannot represent an instant on the time-line without additional information such as an Review comment: Done. ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java ## @@ -303,8 +303,8 @@ private static Object fieldToAvatica(Schema.FieldType type, Object beamValue) { String logicalId = type.getLogicalType().getIdentifier(); if (logicalId.equals(TimeType.IDENTIFIER)) { return (int) ((ReadableInstant) beamValue).getMillis(); -} else if (logicalId.equals(DateType.IDENTIFIER)) { - return (int) (((ReadableInstant) beamValue).getMillis() / MILLIS_PER_DAY); +} else if (logicalId.equals(SqlTypes.DATE.getIdentifier())) { Review comment: Done. ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Date.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.schemas.logicaltypes; + +import java.time.LocalDate; +import org.apache.beam.sdk.schemas.Schema; + +/** + * A date without a time-zone. + * + * It cannot represent an instant on the time-line without additional information such as an + * offset or time-zone. + */ +public class Date implements Schema.LogicalType { Review comment: I think Andrew is basically suggesting using a `PassThroughLogicalType` as a logical type for `DATE`. I think we could definitely consider this if performance becomes a problem in the future. (It's not easy to change the in-memory type for `Date` after it
[GitHub] [beam] robinyqiu commented on pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType
robinyqiu commented on pull request #11272: URL: https://github.com/apache/beam/pull/11272#issuecomment-624451644 Ah, just realized that the previous comments were not sent out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624431077 Run PythonDocker PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624431173 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624430895 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624431013 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] stale[bot] commented on pull request #10791: [BEAM-9250] Update release guide with more instructions.
stale[bot] commented on pull request #10791: URL: https://github.com/apache/beam/pull/10791#issuecomment-624403721 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] stale[bot] commented on pull request #11034: [BEAM-9424] Allow grouping by LogicalType
stale[bot] commented on pull request #11034: URL: https://github.com/apache/beam/pull/11034#issuecomment-624403737 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 commented on a change in pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
rahul8383 commented on a change in pull request #11609: URL: https://github.com/apache/beam/pull/11609#discussion_r420487951 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java ## @@ -97,4 +99,19 @@ public void testNanosDuration() { assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class)); assertEquals(durationAsRow, row.getBaseValue(0, Row.class)); } + + @Test(expected = IllegalArgumentException.class) + public void testFixedBytesIllegalArgument() { +Schema schema = Schema.builder().addLogicalTypeField("char", FixedBytes.of(10)).build(); +byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5}; +Row row = Row.withSchema(schema).withFieldValue("char", byteArrayWithLengthFive).build(); + } + + @Test + public void testFixedBytes() { +Schema schema = Schema.builder().addLogicalTypeField("char", FixedBytes.of(10)).build(); +byte[] byteArray = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; +Row row = Row.withSchema(schema).withFieldValue("char", byteArray).build(); +assertTrue(Arrays.equals(byteArray, row.getLogicalTypeValue("char", byte[].class))); + } Review comment: Moved the tests to RowTest.java case in point! How can I write `FixedBytes` test which tests the behaviour of appending zeros? To test this behaviour, the input value should have length < expectedLength. But, if the input value's length is less than expected length, an `IllegalArgumentException` is thrown while building the Row. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 commented on a change in pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
rahul8383 commented on a change in pull request #11609: URL: https://github.com/apache/beam/pull/11609#discussion_r420486636 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java ## @@ -97,4 +99,19 @@ public void testNanosDuration() { assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class)); assertEquals(durationAsRow, row.getBaseValue(0, Row.class)); } + + @Test(expected = IllegalArgumentException.class) + public void testFixedBytesIllegalArgument() { +Schema schema = Schema.builder().addLogicalTypeField("char", FixedBytes.of(10)).build(); +byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5}; +Row row = Row.withSchema(schema).withFieldValue("char", byteArrayWithLengthFive).build(); + } Review comment: Added `RowTest.testLogicalTypeWithInvalidInputValueByFieldIndex` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624370689 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11581: [BEAM-8307] NPE in Calcite dialect when input PCollection has logical…
amaliujia commented on pull request #11581: URL: https://github.com/apache/beam/pull/11581#issuecomment-624366230 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy
iemejia commented on pull request #11613: URL: https://github.com/apache/beam/pull/11613#issuecomment-624365589 I am ok with squashing if it creates extra commits, but it does not seem to be the case or does it create the additional extra merge commit? I just want to ensure we follow the rules. Now if the goal is to change the rules maybe we should move the discussion to the ML. Pinging @kennknowles who created our default policy (extra Merge commit) to see what he thinks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy
pabloem commented on pull request #11613: URL: https://github.com/apache/beam/pull/11613#issuecomment-624363164 I think squash is used by many, and it facilitates receiving contributions without the extra round trip to the contributors. I think we need to discuss more before removing it. Wdyt @iemejia? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11590: [BEAM-8944] Improve UnboundedThreadPoolExecutor performance
pabloem commented on a change in pull request #11590: URL: https://github.com/apache/beam/pull/11590#discussion_r420468171 ## File path: sdks/python/apache_beam/utils/thread_pool_executor.py ## @@ -137,35 +101,33 @@ def submit(self, fn, *args, **kwargs): """ future = _base.Future() work_item = _WorkItem(future, fn, args, kwargs) -try: - # Keep trying to get an idle worker from the queue until we find one - # that accepts the work. - while not self._idle_worker_queue.get( - block=False).accepted_work(work_item): -pass - return future -except queue.Empty: - with self._lock: -if self._shutdown: - raise RuntimeError( - 'Cannot schedule new tasks after thread pool ' - 'has been shutdown.') - -worker = _Worker( -self._idle_worker_queue, -self._permitted_thread_age_in_seconds, -work_item) +with self._lock: + if self._shutdown: +raise RuntimeError( +'Cannot schedule new tasks after thread pool has been shutdown.') + try: +self._idle_worker_queue.get(block=False).assign_work(work_item) + +# If we have more idle threads then the max allowed, shutdown a thread. +if self._idle_worker_queue.qsize() > self._max_idle_threads: + try: +self._idle_worker_queue.get(block=False).shutdown() Review comment: should we remove a total of `self._max_idle_threads - self._idle_worker_queue.qsize()` workers rather than just one? IIUC, this is the only point (besides shutdown) where workers are removed, so maybe yes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
rahul8383 commented on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624360661 > We always convert logical types to their base type when serializing with SchemaCoder, and convert back to the input type when deserializing. Other than that I think the only time it should get called is when constructing a Row instance (unless you use attachValues). In that case, there is no need to handle this `else` case right? as we are making sure that the input has expected length while building the Row. https://github.com/apache/beam/blob/5e1571760b61b8ce247d5375b71c8df4d69d6409/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/FixedBytes.java#L77 Even if `attachValues` is used while building the Row and the provided input value is invalid(invalid length), during serialization in `SchemaCoder`, the input value cannot be converted to base type as it doesn't have expected length and an `IllegalArgumentException` will be thrown. > Would this just be so that we're guaranteed to call `toInputType` whenever setting a value on Row? This PR accomplishes the same thing right? Can we support this feature: depending on the type of the input value provided while building the Row, we can call `toInputType(toBaseType(inputValue))` or `toInputType(inputValue)` i.e. support for providing base value while building the Row. If both the InputType and BaseType are one and the same, we can directly call `toInputType(inputValue)`. I am thinking that this might be helpful for logical types like `FixedBytes` or `FixedLengthString`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11616: Use csv reader instead of split to read csv data.
aaltay commented on pull request #11616: URL: https://github.com/apache/beam/pull/11616#issuecomment-624360261 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on pull request #11616: Use csv reader instead of split to read csv data.
KevinGG commented on pull request #11616: URL: https://github.com/apache/beam/pull/11616#issuecomment-624359913 R: @aaltay PTAL, thx! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
TheNeuralBit commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624359510 Ok I removed `beam_PreCommit_Website_Commit/src/website/www` and `beam_PreCommit_Website_Stage_GCS_Commit/src/website/www` on `apache-beam-jenkins-{1..15} `. Hopefully that will unbreak those jobs. We need to figure out why this change is causing jenkins to create files owned by root though, any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624356609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on a change in pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
iemejia commented on a change in pull request #11614: URL: https://github.com/apache/beam/pull/11614#discussion_r420461169 ## File path: runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java ## @@ -242,6 +243,7 @@ public void close() {} * This test verifies that watermarks are correctly forwarded. */ @Test(timeout = 30_000) +@Ignore("https://issues.apache.org/jira/browse/BEAM-9164;) Review comment: Oh my bad sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy
iemejia commented on pull request #11613: URL: https://github.com/apache/beam/pull/11613#issuecomment-624353261 @robertwb Agree, a consensus that nobody respects :). In the case of this PR I set into the merge approach because we have only [three options](https://help.github.com/en/github/administering-a-repository/about-merge-methods-on-github) and the other two (squash and rebase) do not create the additional merge commit required by the [Beam committer guide](https://beam.apache.org/contribute/committer-guide/#merging-it). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG opened a new pull request #11616: Use csv reader instead of split to read csv data.
KevinGG opened a new pull request #11616: URL: https://github.com/apache/beam/pull/11616 There might be comma in the csv formatted data itself. A naive split will generate malformed data and cause errors. Using `csv.reader` solves the issue. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[GitHub] [beam] pabloem commented on pull request #11296: [BEAM-9640] Sketching watermark tracking on FnApiRunner
pabloem commented on pull request #11296: URL: https://github.com/apache/beam/pull/11296#issuecomment-624351936 > Sorry it took so long to get to this. Most of my questions are around watermark advancement. no worries. This is a critical component, and I have other work to do, so I'm glad to get a thoughtful review. I'll address your comments soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform
pabloem commented on a change in pull request #11582: URL: https://github.com/apache/beam/pull/11582#discussion_r420456908 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1641,3 +1644,314 @@ def process(self, unused_element, signal): *self._args, **self._kwargs)) | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location))) + + +class _ExtractBQData(DoFn): + ''' + PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into + a temporary storage and returns metadata for created files. + ''' + def __init__( + self, + gcs_location_pattern=None, + project=None, + coder=None, + schema=None, + kms_key=None): + +self.gcs_location_pattern = gcs_location_pattern +self.project = project +self.coder = coder or _JsonToDictCoder +self.kms_key = kms_key +self.split_result = None +self.schema = schema +self.target_schema = None + + def process(self, element): +''' +:param element(ReadAllFromBigQueryRequest): +:return: +''' +element.validate() +if element.table is not None: + table_reference = bigquery_tools.parse_table_reference(element.table) + query = None + use_legacy_sql = True +else: + query = element.query + use_legacy_sql = element.use_legacy_sql + +flatten_results = element.flatten_results + +bq = bigquery_tools.BigQueryWrapper() + +try: + if element.query is not None: +self._setup_temporary_dataset(bq, query, use_legacy_sql) +table_reference = self._execute_query( +bq, query, use_legacy_sql, flatten_results) + + gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex) + + table_schema = bq.get_table( + table_reference.projectId, + table_reference.datasetId, + table_reference.tableId).schema + + if self.target_schema is None: +self.target_schema = bigquery_tools.parse_table_schema_from_json( +json.dumps(self.schema)) + + if not self.target_schema == table_schema: +raise ValueError(( +"Schema generated by reading from BQ doesn't match expected" +"schema.\nExpected: {}\nActual: {}").format( +self.target_schema, table_schema)) + + metadata_list = self._export_files(bq, table_reference, gcs_location) + + yield pvalue.TaggedOutput('location_to_cleanup', gcs_location) + for metadata in metadata_list: +yield metadata.path + +finally: + if query is not None: +bq.clean_up_temporary_dataset(self.project) + + def _setup_temporary_dataset(self, bq, query, use_legacy_sql): +location = bq.get_query_location(self.project, query, use_legacy_sql) +bq.create_temporary_dataset(self.project, location) + + def _execute_query(self, bq, query, use_legacy_sql, flatten_results): +job = bq._start_query_job( +self.project, +query, +use_legacy_sql, +flatten_results, +job_id=uuid.uuid4().hex, +kms_key=self.kms_key) +job_ref = job.jobReference +bq.wait_for_bq_job(job_ref) +return bq._get_temp_table(self.project) + + def _export_files(self, bq, table_reference, gcs_location): +"""Runs a BigQuery export job. + +Returns: + a list of FileMetadata instances +""" +job_id = uuid.uuid4().hex +job_ref = bq.perform_extract_job([gcs_location], + job_id, + table_reference, + bigquery_tools.FileFormat.JSON, + include_header=False) +bq.wait_for_bq_job(job_ref) +metadata_list = FileSystems.match([gcs_location])[0].metadata_list + +return metadata_list + + +class _PassThroughThenCleanupWithSI(PTransform): + """A PTransform that invokes a DoFn after the input PCollection has been +processed. + +DoFn should have arguments (element, side_input, cleanup_signal). + +Utilizes readiness of PCollection to trigger DoFn. + """ + def __init__(self, cleanup_dofn, side_input): +self.cleanup_dofn = cleanup_dofn +self.side_input = side_input + + def expand(self, input): +class PassThrough(beam.DoFn): + def process(self, element): +yield element + +main_output, cleanup_signal = input | beam.ParDo( + PassThrough()).with_outputs( + 'cleanup_signal', main='main') + +_ = ( +input.pipeline +| beam.Create([None]) +| beam.ParDo( +self.cleanup_dofn, +self.side_input, +beam.pvalue.AsSingleton(cleanup_signal))) + +return main_output + + +class ReadAllFromBigQueryRequest: Review comment: I worry that this is a little clunky - but I appreciate that it provides validation, and even type checking if necessary. Perhaps give it a shorter name so it's 'easy' to create. cc: @robertwb
[GitHub] [beam] TheNeuralBit commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
TheNeuralBit commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624346035 We need to stop re-testing this. Its creating a bunch of files owned by root on the workers, so then subsequent runs fail because they don't have permissions to clean it up: ``` bhulette@apache-beam-jenkins-1:~$ ll /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Website_Stage_GCS_Commit/src/website/www/node_modules | head total 528 drwxr-xr-x 122 rootroot 4096 Apr 28 20:02 ./ drwxrwxr-x 4 jenkins jenkins 4096 Apr 29 09:51 ../ drwxr-xr-x 2 rootroot 4096 Apr 28 20:02 ansi-regex/ drwxr-xr-x 2 rootroot 4096 Apr 28 20:02 ansi-styles/ drwxr-xr-x 2 rootroot 4096 Apr 28 20:02 anymatch/ drwxr-xr-x 3 rootroot 4096 Apr 28 20:02 argparse/ drwxr-xr-x 2 rootroot 4096 Apr 28 20:02 array-union/ drwxr-xr-x 6 rootroot 4096 Apr 28 20:02 autoprefixer/ drwxr-xr-x 2 rootroot 4096 Apr 28 20:02 balanced-match/ ``` It's causing the precommit to fail for other PRs too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas
damondouglas commented on a change in pull request #11564: URL: https://github.com/apache/beam/pull/11564#discussion_r420451192 ## File path: learning/katas/go/Core Transforms/Map/ParDo OneToMany/task.md ## @@ -0,0 +1,32 @@ + + +# ParDo - One to Many + +In the previous kata we learned that ParDo maps a single element into another element. +In this kata we will map a single element into many by splitting a sentence into words. + +**Kata:** Please write a ParDo that maps each input sentence into words tokenized by whitespace (" "). + + + Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo;> + ParDo Review comment: @lostluck Do you mean this? If so, I could check across the various tasks of the existing katas to make sure its consistent. ``` Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo;> beam.ParDo ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
pabloem commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624343145 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform
pabloem commented on a change in pull request #11582: URL: https://github.com/apache/beam/pull/11582#discussion_r420386613 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1641,3 +1644,314 @@ def process(self, unused_element, signal): *self._args, **self._kwargs)) | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location))) + + +class _ExtractBQData(DoFn): + ''' + PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into + a temporary storage and returns metadata for created files. + ''' + def __init__( + self, + gcs_location_pattern=None, + project=None, + coder=None, + schema=None, + kms_key=None): + +self.gcs_location_pattern = gcs_location_pattern +self.project = project +self.coder = coder or _JsonToDictCoder +self.kms_key = kms_key +self.split_result = None +self.schema = schema +self.target_schema = None + + def process(self, element): +''' +:param element(ReadAllFromBigQueryRequest): +:return: +''' +element.validate() +if element.table is not None: + table_reference = bigquery_tools.parse_table_reference(element.table) + query = None + use_legacy_sql = True +else: + query = element.query + use_legacy_sql = element.use_legacy_sql + +flatten_results = element.flatten_results + +bq = bigquery_tools.BigQueryWrapper() + +try: + if element.query is not None: +self._setup_temporary_dataset(bq, query, use_legacy_sql) +table_reference = self._execute_query( +bq, query, use_legacy_sql, flatten_results) + + gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex) + + table_schema = bq.get_table( + table_reference.projectId, + table_reference.datasetId, + table_reference.tableId).schema + + if self.target_schema is None: +self.target_schema = bigquery_tools.parse_table_schema_from_json( +json.dumps(self.schema)) + + if not self.target_schema == table_schema: Review comment: why do you need a target_schema? ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1641,3 +1644,314 @@ def process(self, unused_element, signal): *self._args, **self._kwargs)) | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location))) + + +class _ExtractBQData(DoFn): + ''' + PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into + a temporary storage and returns metadata for created files. + ''' + def __init__( + self, + gcs_location_pattern=None, + project=None, + coder=None, + schema=None, + kms_key=None): + +self.gcs_location_pattern = gcs_location_pattern +self.project = project +self.coder = coder or _JsonToDictCoder +self.kms_key = kms_key +self.split_result = None +self.schema = schema +self.target_schema = None + + def process(self, element): +''' +:param element(ReadAllFromBigQueryRequest): +:return: +''' +element.validate() +if element.table is not None: + table_reference = bigquery_tools.parse_table_reference(element.table) + query = None + use_legacy_sql = True +else: + query = element.query + use_legacy_sql = element.use_legacy_sql + +flatten_results = element.flatten_results + +bq = bigquery_tools.BigQueryWrapper() Review comment: It would be great if the BQ wrapper could be passed a client as an argument, so that a mocked-out BQ client could be used. See https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L993 and https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1047-L1048 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1641,3 +1644,314 @@ def process(self, unused_element, signal): *self._args, **self._kwargs)) | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location))) + + +class _ExtractBQData(DoFn): + ''' + PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into + a temporary storage and returns metadata for created files. + ''' + def __init__( + self, + gcs_location_pattern=None, + project=None, + coder=None, + schema=None, + kms_key=None): + +self.gcs_location_pattern = gcs_location_pattern +self.project = project +self.coder = coder or _JsonToDictCoder +self.kms_key = kms_key +self.split_result = None +self.schema = schema +self.target_schema = None + + def process(self, element): +''' +:param element(ReadAllFromBigQueryRequest): +:return: +''' +element.validate() +if element.table is not None: + table_reference = bigquery_tools.parse_table_reference(element.table) + query = None +
[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
pabloem commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624337831 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
pabloem commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624337083 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11590: [BEAM-8944] Improve UnboundedThreadPoolExecutor performance
lukecwik commented on pull request #11590: URL: https://github.com/apache/beam/pull/11590#issuecomment-624329537 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] reuvenlax commented on a change in pull request #11350: [BEAM-1589] Added @OnWindowExpiration annotation.
reuvenlax commented on a change in pull request #11350: URL: https://github.com/apache/beam/pull/11350#discussion_r420423316 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java ## @@ -857,6 +863,223 @@ public BundleFinalizer bundleFinalizer() { } } + /** + * A concrete implementation of {@link DoFnInvoker.ArgumentProvider} used for running a {@link + * DoFn} on window expiration. + */ + private class OnWindowExpirationArgumentProvider + extends DoFn.OnWindowExpirationContext + implements DoFnInvoker.ArgumentProvider { +private final BoundedWindow window; +private final Instant timestamp; +private final KeyT key; +/** Lazily initialized; should only be accessed via {@link #getNamespace()}. */ +private @Nullable StateNamespace namespace; + +/** + * The state namespace for this context. + * + * Any call to this method when more than one window is present will crash; this represents a + * bug in the runner or the {@link DoFnSignature}, since values must be in exactly one window + * when state or timers are relevant. + */ +private StateNamespace getNamespace() { + if (namespace == null) { +namespace = StateNamespaces.window(windowCoder, window); + } + return namespace; +} + +private OnWindowExpirationArgumentProvider(BoundedWindow window, Instant timestamp, KeyT key) { + fn.super(); + this.window = window; + this.timestamp = timestamp; + this.key = key; +} + +@Override +public BoundedWindow window() { + return window; +} + +@Override +public PaneInfo paneInfo(DoFn doFn) { + throw new UnsupportedOperationException( + "Cannot access paneInfo outside of @ProcessElement methods."); +} + +@Override +public PipelineOptions pipelineOptions() { + return getPipelineOptions(); +} + +@Override +public DoFn.StartBundleContext startBundleContext(DoFn doFn) { + throw new UnsupportedOperationException("StartBundleContext parameters are not supported."); +} + +@Override +public DoFn.FinishBundleContext finishBundleContext( +DoFn doFn) { + throw new UnsupportedOperationException("FinishBundleContext parameters are not supported."); +} + +@Override +public DoFn.ProcessContext processContext(DoFn doFn) { + throw new UnsupportedOperationException("ProcessContext parameters are not supported."); +} + +@Override +public InputT element(DoFn doFn) { + throw new UnsupportedOperationException("Element parameters are not supported."); +} + +@Override +public Object sideInput(String tagId) { + throw new UnsupportedOperationException("SideInput parameters are not supported."); +} + +@Override +public Object schemaElement(int index) { + throw new UnsupportedOperationException("Element parameters are not supported."); +} + +@Override +public Instant timestamp(DoFn doFn) { + return timestamp; +} + +@Override +public String timerId(DoFn doFn) { + throw new UnsupportedOperationException("Timer parameters are not supported."); +} + +@Override +public TimeDomain timeDomain(DoFn doFn) { + throw new UnsupportedOperationException( + "Cannot access time domain outside of @ProcessTimer method."); +} + +@Override +public KeyT key() { + return key; +} + +@Override +public OutputReceiver outputReceiver(DoFn doFn) { + return DoFnOutputReceivers.windowedReceiver(this, mainOutputTag); +} + +@Override +public OutputReceiver outputRowReceiver(DoFn doFn) { + return DoFnOutputReceivers.rowReceiver(this, mainOutputTag, mainOutputSchemaCoder); +} + +@Override +public MultiOutputReceiver taggedOutputReceiver(DoFn doFn) { + return DoFnOutputReceivers.windowedMultiReceiver(this, outputCoders); +} + +@Override +public Object restriction() { + throw new UnsupportedOperationException("@Restriction parameters are not supported."); +} + +@Override +public DoFn.OnTimerContext onTimerContext(DoFn doFn) { + throw new UnsupportedOperationException("OnTimerContext parameters are not supported."); +} + +@Override +public RestrictionTracker restrictionTracker() { + throw new UnsupportedOperationException("RestrictionTracker parameters are not supported."); +} + +@Override +public Object watermarkEstimatorState() { + throw new UnsupportedOperationException( + "@WatermarkEstimatorState parameters are not supported."); +} + +@Override +public WatermarkEstimator watermarkEstimator() { + throw new UnsupportedOperationException("WatermarkEstimator parameters are not supported."); +} + +@Override +public State state(String stateId, boolean alwaysFetched) { + try { +
[GitHub] [beam] thetorpedodog opened a new pull request #11615: passert.Equals: sort output strings for easier reading
thetorpedodog opened a new pull request #11615: URL: https://github.com/apache/beam/pull/11615 R: @lostluck --- Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![Build
[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
pabloem commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624321585 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
pabloem commented on pull request #11554: URL: https://github.com/apache/beam/pull/11554#issuecomment-624319547 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms
robertwb commented on a change in pull request #11610: URL: https://github.com/apache/beam/pull/11610#discussion_r420415887 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.transforms; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import org.apache.beam.sdk.transforms.join.CoGbkResult; +import org.apache.beam.sdk.transforms.join.CoGroupByKey; +import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple; +import org.apache.beam.sdk.values.KV; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.TupleTag; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables; + +public class SetFns { + + /** + * Returns a new {@code SetFns.SetImpl} transform that compute the intersection with provided + * {@code PCollection}. + * + * The argument should not be modified after this is called. + * + * The elements of the output {@link PCollection} will all distinct elements that present in + * both pipeline is constructed and provided {@link PCollection}. + * + * {@code + * Pipeline p = ...; + * + * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5")); + * PCollection right = p.apply(Create.of("1", "3", "4", "6")); + * + * PCollection results = + * left.apply(SetFns.intersect(right)); + * } + */ + public static SetImpl intersect(PCollection rightCollection) { +checkNotNull(rightCollection, "rightCollection argument is null"); +SerializableBiFunction intersectFn = +(numberOfElementsinLeft, numberOfElementsinRight) -> (numberOfElementsinLeft > 0 && numberOfElementsinRight > 0) ? 1L : 0L; +return new SetImpl<>(rightCollection, intersectFn); + } + + /** + * Returns a new {@code SetFns.SetImpl} transform that compute the intersection all with + * provided {@code PCollection}. + * + * The argument should not be modified after this is called. + * + * The elements of the output {@link PCollection} which will follow EXCEPT_ALL Semantics as + * follows: Given there are m elements on pipeline which is constructed {@link PCollection} + * (left) and n elements on in provided {@link PCollection} (right): - it will output MIN(m - + * n, 0) elements of left for all elements which are present in both left and right. + * + * {@code + * Pipeline p = ...; + * + * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5")); + * PCollection right = p.apply(Create.of("1", "3", "4", "6")); + * + * PCollection results = + * left.apply(SetFns.intersectAll(right)); + * } + */ + public static SetImpl intersectAll(PCollection rightCollection) { +checkNotNull(rightCollection, "rightCollection argument is null"); +SerializableBiFunction intersectFn = +(numberOfElementsinLeft, numberOfElementsinRight) -> (numberOfElementsinLeft > 0 && numberOfElementsinRight > 0) ? Math.min(numberOfElementsinLeft, numberOfElementsinRight) : 0L; +return new SetImpl<>(rightCollection, intersectFn); + } + + /** + * Returns a new {@code SetFns.SetImpl} transform that compute the difference (except) with + * provided {@code PCollection}. + * + * The argument should not be modified after this is called. + * + * The elements of the output {@link PCollection} will all distinct elements that present in + * pipeline is constructed {@link PCollection} but not present in provided {@link + * PCollection}. + * + * {@code + * Pipeline p = ...; + * + * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5")); + * PCollection right = p.apply(Create.of("1", "3", "4", "6")); + * + * PCollection results = + * left.apply(SetFns.except(right)); + * } + */ + public static SetImpl except(PCollection rightCollection) { +checkNotNull(rightCollection, "rightCollection argument is null"); +SerializableBiFunction exceptFn = +(numberOfElementsinLeft, numberOfElementsinRight) -> numberOfElementsinLeft > 0 && numberOfElementsinRight == 0 ? 1L
[GitHub] [beam] je-ik commented on pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
je-ik commented on pull request #11614: URL: https://github.com/apache/beam/pull/11614#issuecomment-624317271 @robertwb Looks like we should introduce some measures to solve this (not sure which measures these should be), because like we are accumulating these ignored tests: ``` ~/git/apache/beam$ git grep "@Ignore" | wc -l 102 ``` Most of associated issues remain open, which is what I would expect without better visibility. I'll start a discussion thread on ML. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tweise commented on a change in pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
tweise commented on a change in pull request #11558: URL: https://github.com/apache/beam/pull/11558#discussion_r420389628 ## File path: .test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy ## @@ -142,9 +249,22 @@ PhraseTriggeringPostCommitBuilder.postCommitJob( 'Load Tests Python ParDo Flink Batch suite', this ) { - loadTest(delegate, CommonTestProperties.TriggeringContext.PR) + loadBatchTests(delegate, CommonTestProperties.TriggeringContext.PR) +} + +PhraseTriggeringPostCommitBuilder.postCommitJob( +'beam_LoadTests_Python_ParDo_Flink_Streaming', Review comment: Why multiple trigger phrases? ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkDebugPipelineOptions.java ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.flink; + +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; + +/** Debug options which shouldn't normally be used. */ +public interface FlinkDebugPipelineOptions extends PipelineOptions { + + @Description( + "If not null, reports the checkpoint duration of each ParDo stage in the provided metric namespace.") + String getReportCheckpointDuration(); + + void setReportCheckpointDuration(String metricNamespace); + + @Description( + "Shuts down sources which have been idle for the configured time of milliseconds. Once a source has been " + + "shut down, chekpointing is not possible anymore. Shutting down the sources eventually leads to pipeline " + + "shutdown once all input has been processed.") + @Default.Long(0) + Long getShutdownSourcesAfterIdleMs(); Review comment: Unless I misread, this parameter is directly tied to `!shutdownSourcesOnFinalWatermark`? How about consolidating the two? Just a single parameter shutdownSourcesAfterIdleMs should suffice: 0 - immediate shutdown, which should be default, unless checkpointing is enabled value > 0 - wait, potentially forever There was a question on the ML recently about shutdownSourcesOnFinalWatermark and if that should not be default. I think it should be (unless checkpointing was enabled), in which case we can never shutdown. So there should be almost no situation where this parameter needs to be set, except in a special case like this. ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkDebugPipelineOptions.java ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.flink; + +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; + +/** Debug options which shouldn't normally be used. */ +public interface FlinkDebugPipelineOptions extends PipelineOptions { + + @Description( + "If not null, reports the checkpoint duration of each ParDo stage in the provided metric namespace.") + String getReportCheckpointDuration(); + + void setReportCheckpointDuration(String metricNamespace); + + @Description( + "Shuts down sources which have been idle for the configured time of milliseconds. Once a source has been " + + "shut down, chekpointing is not possible anymore. Shutting down the sources eventually leads to pipeline " Review
[GitHub] [beam] robertwb commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy
robertwb commented on pull request #11613: URL: https://github.com/apache/beam/pull/11613#issuecomment-624311669 I thought consensus was that we did want to encourage squash for those PRs with a huge pile of fixup commits (and otherwise no semantically meaningful commits). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
robertwb commented on pull request #11614: URL: https://github.com/apache/beam/pull/11614#issuecomment-624310548 I'm not sure we have more visibility into disabled tests than the jira entry (which shouldn't get closed until the tests are fixed and/or deemed unneeded and deleted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11296: [BEAM-9640] Sketching watermark tracking on FnApiRunner
robertwb commented on a change in pull request #11296: URL: https://github.com/apache/beam/pull/11296#discussion_r420388089 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -363,16 +387,17 @@ def _run_bundle_multiple_times_for_testing( finally: runner_execution_context.state_servicer.restore() + @staticmethod def _collect_written_timers_and_add_to_fired_timers( - self, bundle_context_manager, # type: execution.BundleContextManager fired_timers # type: Dict[Tuple[str, str], ListBuffer] Review comment: Are these fired_timers, or timers_to_fire? ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py ## @@ -296,6 +297,12 @@ def __init__(self, self.safe_coders = safe_coders self.data_channel_coders = data_channel_coders +self.transform_id_to_buffer_id = { Review comment: Different transforms may have different input/output buffers associated with them. Perhaps name this `input_transform_to_buffer_id` or `buffer_id_by_consumer` or similar. ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -384,13 +409,25 @@ def _collect_written_timers_and_add_to_fired_timers( out = create_OutputStream() for decoded_timer in timers_by_key_and_window.values(): timer_coder_impl.encode_to_stream(decoded_timer, out, True) + if (transform_id, timer_family_id) not in timer_watermark_data: +timer_watermark_data[(transform_id, + timer_family_id)] = timestamp.MAX_TIMESTAMP + timer_watermark_data[(transform_id, timer_family_id)] = min( + timer_watermark_data[(transform_id, timer_family_id)], + decoded_timer.fire_timestamp) fired_timers[(transform_id, timer_family_id)] = ListBuffer( coder_impl=timer_coder_impl) fired_timers[(transform_id, timer_family_id)].append(out.get()) written_timers.clear() +return timer_watermark_data + def _add_sdk_delayed_applications_to_deferred_inputs( self, bundle_context_manager, bundle_result, deferred_inputs): +# type: (...) -> Set[str] + +"""Returns a set of PCollections with delayed applications.""" Review comment: Set of PCollection ids? Buffer ids? ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/watermark_manager.py ## @@ -0,0 +1,206 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Utilities for managing watermarks for a pipeline execution by FnApiRunner.""" + +from __future__ import absolute_import + +from apache_beam.portability.api import beam_runner_api_pb2 +from apache_beam.runners.portability.fn_api_runner import translations +from apache_beam.runners.portability.fn_api_runner.translations import split_buffer_id +from apache_beam.runners.worker import bundle_processor +from apache_beam.utils import proto_utils +from apache_beam.utils import timestamp + + +class WatermarkManager(object): + """Manages the watermarks of a pipeline's stages. +It works by constructing an internal graph representation of the pipeline, +and keeping track of dependencies.""" + class WatermarkNode(object): +def __init__(self, name): + self.name = name + + class PCollectionNode(WatermarkNode): +def __init__(self, name): + super(WatermarkManager.PCollectionNode, self).__init__(name) + self._watermark = timestamp.MIN_TIMESTAMP + self.producers = set() + +def __str__(self): + return 'PCollectionNodehttp://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Utilities for managing watermarks for a pipeline execution by FnApiRunner.""" + +from __future__ import absolute_import + +from apache_beam.portability.api import beam_runner_api_pb2 +from
[GitHub] [beam] udim commented on pull request #11038: [BEAM-7746] More typing fixes
udim commented on pull request #11038: URL: https://github.com/apache/beam/pull/11038#issuecomment-624296105 > Hi everyone, I have some availability to finish this PR off now. I'm going to rebase it soon. @udim do you have the time to help me get this through review? Yeah, I'll make another pass later today This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik commented on a change in pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
je-ik commented on a change in pull request #11614: URL: https://github.com/apache/beam/pull/11614#discussion_r420390006 ## File path: runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java ## @@ -242,6 +243,7 @@ public void close() {} * This test verifies that watermarks are correctly forwarded. */ @Test(timeout = 30_000) +@Ignore("https://issues.apache.org/jira/browse/BEAM-9164;) Review comment: This is not `@Category(ValidatesRunner)` test, looks like it is Flink-specific already? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on a change in pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
iemejia commented on a change in pull request #11614: URL: https://github.com/apache/beam/pull/11614#discussion_r420387278 ## File path: runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java ## @@ -242,6 +243,7 @@ public void close() {} * This test verifies that watermarks are correctly forwarded. */ @Test(timeout = 30_000) +@Ignore("https://issues.apache.org/jira/browse/BEAM-9164;) Review comment: This one is apparently only flaky on Flink can we better exclude it manually only for Flink? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
ibzib commented on pull request #11403: URL: https://github.com/apache/beam/pull/11403#issuecomment-624286892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik commented on pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
je-ik commented on pull request #11614: URL: https://github.com/apache/beam/pull/11614#issuecomment-624285935 @robertwb Thanks for approval, do we have a way of visualizing ignored tests? I'm a little afraid these test might get ignored for ever, which might be unfortunate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik opened a new pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)
je-ik opened a new pull request #11614: URL: https://github.com/apache/beam/pull/11614 Disables two flaky tests until resolution is found. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] Ardagan commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
Ardagan commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624265259 Hey Kamil, can we also add a proper landing page for metrics site? People regularly can't navigate to dashboards they need. Adding landing page with intuitive navigation would help a lot. That should be a separate PR though. [BEAM-6710](https://issues.apache.org/jira/browse/BEAM-6710) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624264625 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm removed a comment on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm removed a comment on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624264392 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624264392 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm removed a comment on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm removed a comment on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624229343 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service
ihji commented on pull request #11574: URL: https://github.com/apache/beam/pull/11574#issuecomment-624260369 > > I think we should also consider adding optional pipeline_options argument to ExternalTransform given that each different expansion service needs different pipeline options. > > I'm not sure I understand that. Shouldn't pipeline options be scoped per-pipeline? Make sense. I find myself that I have a mental model that external transforms are basically imported from different pipelines. If we assume that they are heterogeneous but still parts of a same pipeline, then it's okay to share a single pipeline option. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
aaltay commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624248353 /cc @chamikaramj @tysonjh @kennknowles -- optional review request, if you would like to take a quick look at new benchmarks at http://metrics.beam.apache.org. (Instructions from @Ardagan : To find dashboards: click at top-left on "Home" or "current dashboard name", this will open drop-down list with full set of dashboards.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
aaltay commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624247136 Some comments: - I do see missing data. (Example: http://metrics.beam.apache.org/d/fK0U4JqWz/cogbk-load-tests?orgId=1 -- all graphs missing recent data, java | coGBK | 100B records with a single key missing spark data for longer.) - go benchmarks are completely empty. - Some different colors (Example: http://metrics.beam.apache.org/d/bnlHKP3Wz/java-io-it-tests-dataflow?orgId=1 -- TextIOIT | 1 GB | GCS | "Many files" | GCS Copies is in blue color) - Since all dashboards have python/java selectors, why Python IO IT Tests and Java IO IT Tests are different dashboards? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay edited a comment on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
aaltay edited a comment on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624247136 Some comments: - I do see missing data. (Example: http://metrics.beam.apache.org/d/fK0U4JqWz/cogbk-load-tests?orgId=1 -- all graphs missing recent data, java | coGBK | 100B records with a single key missing spark data for longer.) - go benchmarks are completely empty. - Some different colors (Example: http://metrics.beam.apache.org/d/bnlHKP3Wz/java-io-it-tests-dataflow?orgId=1 -- TextIOIT | 1 GB | GCS | "Many files" | GCS Copies is in blue color) - Since all dashboards have python/java selectors, why Python IO IT Tests and Java IO IT Tests are different dashboards? I might be missing other issues as well. If they are easy to fix later, we can fix what is identified, merge and ask for feedback on dev@ list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
TheNeuralBit commented on a change in pull request #11609: URL: https://github.com/apache/beam/pull/11609#discussion_r420336807 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java ## @@ -97,4 +99,19 @@ public void testNanosDuration() { assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class)); assertEquals(durationAsRow, row.getBaseValue(0, Row.class)); } + + @Test(expected = IllegalArgumentException.class) + public void testFixedBytesIllegalArgument() { +Schema schema = Schema.builder().addLogicalTypeField("char", FixedBytes.of(10)).build(); +byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5}; +Row row = Row.withSchema(schema).withFieldValue("char", byteArrayWithLengthFive).build(); + } Review comment: Could you add a test like this but with `addValues`? ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java ## @@ -97,4 +99,19 @@ public void testNanosDuration() { assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class)); assertEquals(durationAsRow, row.getBaseValue(0, Row.class)); } + + @Test(expected = IllegalArgumentException.class) + public void testFixedBytesIllegalArgument() { +Schema schema = Schema.builder().addLogicalTypeField("char", FixedBytes.of(10)).build(); +byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5}; +Row row = Row.withSchema(schema).withFieldValue("char", byteArrayWithLengthFive).build(); + } + + @Test + public void testFixedBytes() { +Schema schema = Schema.builder().addLogicalTypeField("char", FixedBytes.of(10)).build(); +byte[] byteArray = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; +Row row = Row.withSchema(schema).withFieldValue("char", byteArray).build(); +assertTrue(Arrays.equals(byteArray, row.getLogicalTypeValue("char", byte[].class))); + } Review comment: Since these tests are really checking `Row`'s verification, I think they would be better in `RowTest`. Could you move them there? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
TheNeuralBit commented on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624243528 > Will this line ever get hit? We always convert logical types to their base type when serializing with SchemaCoder, and convert back to the input type when deserializing. Other than that I think the only time it should get called is when constructing a Row instance (unless you use attachValues). > Can we consider that the input value provided is of BaseType, which we can convert to InputType and store in memory? Would this just be so that we're guaranteed to call `toInputType` whenever setting a value on Row? This PR accomplishes the same thing right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
aaltay commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624242630 > Done. I pushed modified version to the website (http://metrics.beam.apache.org) I do not see the new dashboard here. How can I find it? I see these three: Code Velocity Post-commit Test Reliability Stability critical jobs status This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624229343 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624227730 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput
je-ik commented on pull request #11612: URL: https://github.com/apache/beam/pull/11612#issuecomment-624225596 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia opened a new pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy
iemejia opened a new pull request #11613: URL: https://github.com/apache/beam/pull/11613 Issues reported by yamllint and some minor fixes. Also set merge button as the only strategy because we don't want to encourage (o even make possible the other two). R: @pabloem CC: @ibzib This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
Ardagan commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624216529 Some dashboards seem to miss data, but that's due to not all data migrated IIUC. LGTM otherwise. @aaltay can you take a look as well please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan edited a comment on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
Ardagan edited a comment on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624213555 > @aaltay > > > It would be great if we can make data-points clickable with links to relevant job > > Grafana has a feature called Data links [1] that could be use here. But the biggest challenge is to get Jenkins job id for specific data point. When Python or Java test sends their metrics to InfluxDB/BigQuery, they have no knowledge of Jenkins job that executes them. > > Without a rework of sending metrics, this functionality will be difficult to implement. > > @Ardagan Any thoughs? > > [1] https://grafana.com/docs/grafana/latest/reference/datalinks/ I believe we can get jenkins job ID via [env.JOB_NAME](https://stackoverflow.com/questions/8309383/how-to-get-the-jobname-from-jenkins), but this will required update test metric report logic and DB update IIUC. We can add jira to do this improvement in separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
Ardagan commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624213555 > @aaltay > > > It would be great if we can make data-points clickable with links to relevant job > > Grafana has a feature called Data links [1] that could be use here. But the biggest challenge is to get Jenkins job id for specific data point. When Python or Java test sends their metrics to InfluxDB/BigQuery, they have no knowledge of Jenkins job that executes them. > > Without a rework of sending metrics, this functionality will be difficult to implement. > > @Ardagan Any thoughs? > > [1] https://grafana.com/docs/grafana/latest/reference/datalinks/ I believe we can get jenkins job ID via [env.JOB_NAME](https://stackoverflow.com/questions/8309383/how-to-get-the-jobname-from-jenkins). We can add jira to do this improvement in separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay edited a comment on pull request #11210: [BEAM-8949] SpannerIO integration tests
aaltay edited a comment on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-624213199 > @aaltay - Wordcount stream tests are failing. They are running successfully on my local machine. Jira ticket is already created for that - https://issues.apache.org/jira/browse/BEAM-9767 > Maybe retriggering the pre-commit test works but can't say that would pass 100% bcoz of the flaky wordcount test. > > Apart from that, there were 3 pre-commit jobs are triggered almost the same time on Jenkins (with ~30seconds difference) and one of them is successfully completed but sadly the failed one is linked with the github. > > successful pre-commit: https://builds.apache.org/job/beam_PreCommit_Python_Commit/12541/ > > failed jobs: > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12542/ > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12543/ OK, let's re-run to get a clear signal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11210: [BEAM-8949] SpannerIO integration tests
aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-624213199 > @aaltay - Wordcount stream tests are failing. They are running successfully on my local machine. Jira ticket is already created for that - https://issues.apache.org/jira/browse/BEAM-9767 > Maybe retriggering the pre-commit test works but can't say that would pass 100% bcoz of the flaky wordcount test. > > Apart from that, there were 3 pre-commit jobs are triggered almost the same time on Jenkins (with ~30seconds difference) and one of them is successfully completed but sadly the failed one is linked with the github. > > successful pre-commit: https://builds.apache.org/job/beam_PreCommit_Python_Commit/12541/ > > failed jobs: > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12542/ > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12543/ OK, let's re-run to de-flake. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11210: [BEAM-8949] SpannerIO integration tests
aaltay commented on pull request #11210: URL: https://github.com/apache/beam/pull/11210#issuecomment-624213031 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform
Ardagan commented on pull request #11582: URL: https://github.com/apache/beam/pull/11582#issuecomment-624210475 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem commented on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-624209882 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform
Ardagan commented on a change in pull request #11582: URL: https://github.com/apache/beam/pull/11582#discussion_r420286996 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -283,6 +284,8 @@ def compute_table_name(row): 'BigQuerySink', 'WriteToBigQuery', 'ReadFromBigQuery', +'ReadAllFromBigQueryRequest', Review comment: I'm working on adding similar ReadAll to Java API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms
amaliujia commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-624199945 @darshanj You can run `./gradlew spotlessApply` to fix checkstyle issues. You need to run `./gradlew ${module}:check` command to not only run tests but also run style check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms
amaliujia commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-624199252 cc @Mark-Zeng to make sure I tagged the right person. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #11477: [BEAM-9650] Add PeriodicSequence generator.
Ardagan commented on pull request #11477: URL: https://github.com/apache/beam/pull/11477#issuecomment-624198780 Run Website_Stage_GCS PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia edited a comment on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms
amaliujia edited a comment on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-624196073 R: @amaliujia cc: @jhnmora000 @Mark-Zeng (expose more PRs to GSoC students) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput
je-ik commented on pull request #11612: URL: https://github.com/apache/beam/pull/11612#issuecomment-624197391 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms
amaliujia commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-624196073 R: @amaliujia cc: @jhnmora000 @MarkZeng1998 (expose more PRs to GSoC students) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #11557: [BEAM-9845] Stage artifacts over expansion service.
robertwb commented on pull request #11557: URL: https://github.com/apache/beam/pull/11557#issuecomment-624188539 Thanks. Those suites passed locally, I'll look into what's going on here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624184021 Run Python Load Tests ParDo Flink Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks
mxm commented on pull request #11558: URL: https://github.com/apache/beam/pull/11558#issuecomment-624180353 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu removed a comment on pull request #11274: [BEAM-9633] Add PubsubIO performance test
kamilwu removed a comment on pull request #11274: URL: https://github.com/apache/beam/pull/11274#issuecomment-624175575 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu commented on pull request #11274: [BEAM-9633] Add PubsubIO performance test
kamilwu commented on pull request #11274: URL: https://github.com/apache/beam/pull/11274#issuecomment-624176044 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu commented on pull request #11274: [BEAM-9633] Add PubsubIO performance test
kamilwu commented on pull request #11274: URL: https://github.com/apache/beam/pull/11274#issuecomment-624175575 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests
kamilwu commented on pull request #11555: URL: https://github.com/apache/beam/pull/11555#issuecomment-624173530 @aaltay > It would be great if we can make data-points clickable with links to relevant job Grafana has a feature called Data links [1] that could be use here. But the biggest challenge is to get Jenkins job id for specific data point. When Python or Java test sends their metrics to InfluxDB/BigQuery, they have no knowledge of Jenkins job that executes them. Without a rework of sending metrics, this functionality will be difficult to implement. @Ardagan Any thoughs? [1] https://grafana.com/docs/grafana/latest/reference/datalinks/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #11038: [BEAM-7746] More typing fixes
robertwb commented on pull request #11038: URL: https://github.com/apache/beam/pull/11038#issuecomment-624170238 Yes, let's make it happen! I'll help out as well. On Mon, May 4, 2020 at 11:47 AM Chad Dombrova wrote: > > I've rebased onto master. We jumped from 32 errors to 260+. We're going to need to make a concerted effort to get these typing MRs through, and beat the merge conflict fatigue. Can we make it happen? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service
TheNeuralBit commented on pull request #11574: URL: https://github.com/apache/beam/pull/11574#issuecomment-624169642 > I think we should also consider adding optional pipeline_options argument to ExternalTransform given that each different expansion service needs different pipeline options. I'm not sure I understand that. Shouldn't pipeline options be scoped per-pipeline? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput
je-ik commented on pull request #11612: URL: https://github.com/apache/beam/pull/11612#issuecomment-624162961 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] reuvenlax edited a comment on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
reuvenlax edited a comment on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624156907 @TheNeuralBit withFieldValue should replace addValues for most users. addValues is difficult and error prone and withFieldValues allows building a row based on named fields instead of positional fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11607: [BEAM-9430] Makes sure the watermarks returned by estimators are within bounds
chamikaramj commented on pull request #11607: URL: https://github.com/apache/beam/pull/11607#issuecomment-624161195 Closing this temporarily while I look into this further. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
rahul8383 commented on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624160514 @reuvenlax @TheNeuralBit Thanks for the review. Could you please clarify the questions that I have asked above? I wanted to understand more about logical types for my PR #11581 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] reuvenlax commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
reuvenlax commented on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624156907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 edited a comment on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
rahul8383 edited a comment on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624153984 @TheNeuralBit `schemas.logicaltypes.LogicalTypesTest.testFixedBytesIllegalArgument` Test will fail even if `addValue()` method is used instead of `withFieldValue()` method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rahul8383 commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
rahul8383 commented on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624153984 @TheNeuralBit schemas.logicaltypes.LogicalTypesTest.testFixedBytesIllegalArgument Test will fail even if addValue() method is used instead of withFieldValue() method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input
TheNeuralBit commented on pull request #11609: URL: https://github.com/apache/beam/pull/11609#issuecomment-624149550 Hm so there are several ways of manually building a Row instance that provide different levels of runtime type-checking. `Row#addValues` explicitly validates everything, and `Row#attachValues` explicitly does not, for performance reasons. In SQL we have an option to switch between the two: https://github.com/apache/beam/blob/34c58c42f14d2534a2f72f9194ddf9cc69138eea/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L354-L357 So we can have runtime type-checking for debugging, but then turn it off for performance. I'm not sure how `withFieldValue` is intended to work. I'm not sure if the missing `toInputType(toBaseType(value))` for that code path is intentional or an oversight. Can you clarify @reuvenlax? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11557: [BEAM-9845] Stage artifacts over expansion service.
chamikaramj commented on pull request #11557: URL: https://github.com/apache/beam/pull/11557#issuecomment-624130608 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] bntnam commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
bntnam commented on a change in pull request #11554: URL: https://github.com/apache/beam/pull/11554#discussion_r420199948 ## File path: website/www/site/content/en/community/contact-us.md ## @@ -0,0 +1,47 @@ +--- +title: "Contact Us" +aliases: + - /community/ + - /use/issue-tracking/ + - /use/mailing-lists/ + - /get-started/support/ +--- + + +# Contact Us + +There are many ways to reach the Beam user and developer communities - use +whichever one seems best. + + Review comment: If you look at the table shortcode as its mentioned in the comment (its located at website/www/site/layouts/shortcodes/table), you will see we can pass the markdown into a `div` because of `{{ .Inner | markdownify }}`. And we use this table shortcode to replace `{:.table}` syntax in Jekyll. We've done it with several tables. However, only this one, it uses `[^1]` to refer a link and unfortunately `{{ .Inner | markdownify }}` doesn't support this passing, so we need to use the `div` directly on this without passing through `markdownify`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput
je-ik commented on pull request #11612: URL: https://github.com/apache/beam/pull/11612#issuecomment-624121709 Run Direct ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org