[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite. I'm able to run the tests with `--test-pipeline-options="--runner=(Flink/Spark)Runner"` locally as well. It's strange that on your setup you encountered a timeout. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite. I'm able to do so (with `--test-pipeline-options="--runner=(Flink/Spark)Runner"` locally as well. It's strange that on your setup you encountered a timeout as on Jenkins everything goes as expected. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite. I'm able to do so locally as well. It's strange that on your setup you encountered a timeout as on Jenkins everything goes as expected. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite. I'm able to do so locally as well. It's strange that on your setup you encountered a timeout as on Jenkins everything goes as expected. The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup (even with fresh Beam repo) so I wasn't able to reconstruct this issue. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite. The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup (even with fresh Beam repo) so I wasn't able to reconstruct this issue. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite and pass. The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup so I wasn't able to reconstruct this issue. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite and pass. The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner) and maybe something in there caused the problem. But even though it passed locally on my setup (even with fresh Beam repo) so I wasn't able to reconstruct this issue. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #12115: [BEAM-7672] dynamically setup acceptable wheel specs according to installed python version
tvalentyn commented on a change in pull request #12115: URL: https://github.com/apache/beam/pull/12115#discussion_r451988176 ## File path: sdks/python/container/boot.go ## @@ -170,6 +176,28 @@ func main() { log.Fatalf("Python exited: %v", execx.Execute("python", args...)) } +// setup wheel specs according to installed python version +func setupAcceptableWheelSpecs() error { + cmd := exec.Command("python", "-V") + stdoutStderr, err := cmd.CombinedOutput() + if err != nil { + return err + } + re := regexp.MustCompile(`Python (\d)\.(\d).*`) + pyVersions := re.FindStringSubmatch(string(stdoutStderr[:])) + if len(pyVersions) != 3 { + return fmt.Errorf("cannot get parse Python version from %s", stdoutStderr) + } + pyVersion := fmt.Sprintf("%s%s", pyVersions[1], pyVersions[2]) + if pyVersion == "27" { + acceptableWhlSpecs = append(acceptableWhlSpecs, "cp27-cp27mu-manylinux1_x86_64.whl") + } else { + wheelName := fmt.Sprintf("cp%s-cp%sm-manylinux1_x86_64.whl", pyVersion, pyVersion) Review comment: Sorry for a slow response. It seems that 3.8 wheels don't follow this pattern, just noticing this while prepring 2.23.0 release. 3.8 wheels look as follows: apache_beam-2.23.0-cp38-cp38-manylinux1_x86_64.whl Not sure why the pattern has changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759 @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little experience in submitting PRs to Beam. Those tests run on Flink and Spark in Python Postcommit suite and pass, so I think there is no need for the ticket. The problem was in validatesCrossLanguageRunner task. It has more complicated setup (run flink/spark job-service, then run test expansion service, use PortableRunner). But even though it passed locally on my setup so I wasn't able to reconstruct this issue. I added the issue: https://issues.apache.org/jira/browse/BEAM-10429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test
ihji commented on pull request #12060: URL: https://github.com/apache/beam/pull/12060#issuecomment-655913859 Run XVR_Direct PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji removed a comment on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test
ihji removed a comment on pull request #12060: URL: https://github.com/apache/beam/pull/12060#issuecomment-655913859 Run XVR_Direct PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test
ihji commented on pull request #12060: URL: https://github.com/apache/beam/pull/12060#issuecomment-655912009 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji removed a comment on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test
ihji removed a comment on pull request #12060: URL: https://github.com/apache/beam/pull/12060#issuecomment-655912009 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] purbanow commented on a change in pull request #12117: [BEAM-10343] Add dispositions for SnowflakeIO.write
purbanow commented on a change in pull request #12117: URL: https://github.com/apache/beam/pull/12117#discussion_r451973374 ## File path: sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeServiceImpl.java ## @@ -25,7 +27,9 @@ import java.util.function.Consumer; import java.util.stream.Collectors; import javax.sql.DataSource; +import org.apache.beam.sdk.io.snowflake.data.SnowflakeTableSchema; import org.apache.beam.sdk.io.snowflake.enums.CloudProvider; Review comment: I see your point. I even removed this enum and add following variable to the Service file: `private static final String GCS_PREFIX = "gs://";` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11979: [BEAM-9679] Add Partition task to Core Transform katas
lostluck commented on pull request #11979: URL: https://github.com/apache/beam/pull/11979#issuecomment-655910522 That works fine. Given how mechanical the change is, it wouldn't be unreasonable to do the change all at once in one PR. However, I do recommend doing all the directory renames in separate commits from the deleting the old mods so that we can check the relatively smaller changes on a per commit basis (rather than just all at once). Won't matter to the final merge, since it will get squashed, but it will make it easier to review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #11979: [BEAM-9679] Add Partition task to Core Transform katas
damondouglas commented on pull request #11979: URL: https://github.com/apache/beam/pull/11979#issuecomment-655901876 @henryken / @lostluck I've started work on consolidating the module by beginning work at this branch: https://github.com/apache/beam/compare/master...damondouglas:BEAM-10428-single-module-kata-go Essentially the final goal is to make one go.mod at the root of learning/katas/go. So far I've only changed the Hello Beam section so you can preview what I intend for the rest. Would it make sense to start a PR for this branch without the intent to merge initially? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas edited a comment on pull request #11979: [BEAM-9679] Add Partition task to Core Transform katas
damondouglas edited a comment on pull request #11979: URL: https://github.com/apache/beam/pull/11979#issuecomment-655901876 @henryken / @lostluck I've started work on consolidating the module by beginning work at this branch: https://github.com/apache/beam/compare/master...damondouglas:BEAM-10428-single-module-kata-go Essentially the final goal is to make one go.mod at the root of learning/katas/go. So far I've only changed the Hello Beam section so you can preview what I intend for the rest. Would it make sense to start a PR for this branch without the intent to merge initially but until the entire learning/katas/go is refactored? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidyan74 commented on pull request #12201: [BEAM-10291] log full thread dump only when lull duration is more tha…
davidyan74 commented on pull request #12201: URL: https://github.com/apache/beam/pull/12201#issuecomment-655899856 R: @pabloem Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness
kennknowles commented on pull request #11792: URL: https://github.com/apache/beam/pull/11792#issuecomment-655890489 Found a lot of exclusions before I started getting my disks filled by the build & local runner. @robertwb you may be interested in the failures where it seems empty side inputs don't work This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
tvalentyn commented on pull request #12194: URL: https://github.com/apache/beam/pull/12194#issuecomment-655850277 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
TheNeuralBit commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655841690 Interesting. Looks like it is running fine on Python PostCommit, and I'm able to run it locally with the fn api runner as well: ``` python setup.py nosetests --tests apache_beam.io.external.xlang_jdbcio_it_test ``` (cc: @sclukas77 - it looks like this command should work for you now if you have all the dependencies built) Running on Python PostCommit is fine with me, I just wanted to make sure we have it running continuously _somewhere_. Can you file a jira so we don't forget to investigate the issue on Flink/Spark? I'd like to take another look through the code when I'm fresh tomorrow, sorry this is taking so long :grimacing: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
tvalentyn commented on pull request #12194: URL: https://github.com/apache/beam/pull/12194#issuecomment-655830657 Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers
TheNeuralBit commented on a change in pull request #12202: URL: https://github.com/apache/beam/pull/12202#discussion_r451872831 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaCapableIOTableProviderWrapper.java ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider; + +import static org.apache.beam.sdk.util.RowJsonUtils.newObjectMapperWith; + +import com.alibaba.fastjson.JSONObject; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.schemas.io.InvalidConfigurationException; +import org.apache.beam.sdk.schemas.io.InvalidSchemaException; +import org.apache.beam.sdk.schemas.io.SchemaCapableIOProvider; +import org.apache.beam.sdk.schemas.io.SchemaIO; +import org.apache.beam.sdk.util.RowJson; +import org.apache.beam.sdk.values.Row; + +/** + * A general {@link TableProvider} for IOs for consumption by Beam SQL. + * + * Can create child classes for IOs to pass {@link #schemaCapableIOProvider} that is specific to + * the IO. + */ +@Internal +@Experimental +@AutoService(TableProvider.class) +public abstract class SchemaCapableIOTableProviderWrapper extends InMemoryMetaTableProvider { Review comment: ```suggestion abstract class SchemaCapableIOTableProviderWrapper extends InMemoryMetaTableProvider { ``` I think this AutoService annotation is what's causing the Java PreCommit to fail. The `AutoService` annotation makes it so that a call `ServiceLoader.load(TableProvider.class)` will try to instantiate this class if it's in the classpath, and it's not possible to instantiate this since its abstract. Specifically this is the ServiceLoader call that's biting you: https://github.com/apache/beam/blob/1a260cd70117b77e33442f0be8f213363a2db5fc/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java#L85-L86 I think we should also make this package-private This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers
TheNeuralBit commented on a change in pull request #12202: URL: https://github.com/apache/beam/pull/12202#discussion_r451889886 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableWrapper.java ## @@ -15,54 +15,63 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; +package org.apache.beam.sdk.extensions.sql.meta.provider; import java.io.Serializable; -import org.apache.avro.generic.GenericRecord; + +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Internal; import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; -import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; -import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; -import org.apache.beam.sdk.io.parquet.ParquetIO; +import org.apache.beam.sdk.extensions.sql.meta.BaseBeamTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; import org.apache.beam.sdk.options.PipelineOptions; import org.apache.beam.sdk.schemas.Schema; -import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.schemas.io.SchemaIO; import org.apache.beam.sdk.transforms.PTransform; import org.apache.beam.sdk.values.PBegin; import org.apache.beam.sdk.values.PCollection; -import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.POutput; import org.apache.beam.sdk.values.Row; +@Internal +@Experimental +/** + * A generalized {@link Table} for IOs to create IO readers and writers. + */ +public class SchemaIOTableWrapper extends BaseBeamTable implements Serializable { Review comment: ```suggestion class SchemaIOTableWrapper extends BaseBeamTable implements Serializable { ``` I think this can be package-private. It might also make sense to make it an inner class of `SchemaIOTableProviderWrapper`, but I'll leave that up to you ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaCapableIOTableProviderWrapper.java ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider; + +import static org.apache.beam.sdk.util.RowJsonUtils.newObjectMapperWith; + +import com.alibaba.fastjson.JSONObject; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.google.auto.service.AutoService; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.schemas.io.InvalidConfigurationException; +import org.apache.beam.sdk.schemas.io.InvalidSchemaException; +import org.apache.beam.sdk.schemas.io.SchemaCapableIOProvider; +import org.apache.beam.sdk.schemas.io.SchemaIO; +import org.apache.beam.sdk.util.RowJson; +import org.apache.beam.sdk.values.Row; + +/** + * A general {@link TableProvider} for IOs for consumption by Beam SQL. + * + * Can create child classes for IOs to pass {@link #schemaCapableIOProvider} that is specific to + * the IO. + */ +@Internal +@Experimental +@AutoService(TableProvider.class) +public abstract class SchemaCapableIOTableProviderWrapper extends InMemoryMetaTableProvider { Review comment: ```suggestion abstract class SchemaCapableIOTableProviderWrapper extends InMemoryMetaTableProvider { ``` I think this AutoService annotation is what's causing the Java PreCommit to fail. The `AutoService` annotation makes it so that a call `ServiceLoader.load(TableProvider.class)` will try to instantiate this class if it's in the classpath, and it's not possible Specifically this is the ServiceLoader call that's biting you: https://github.com/apache/beam/blob/1a260cd70117b77e33442f0be8f213363a2db5fc/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java#L85-L86 I think we should also make this package-private ## File path: sdks/java/core/sr
[GitHub] [beam] apalmercari commented on pull request #11447: [BEAM-9502] makes Schema UUID generation deterministic
apalmercari commented on pull request #11447: URL: https://github.com/apache/beam/pull/11447#issuecomment-655829440 What changes are required to make it compatible with the dataflow runner update operation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases
robertwb commented on pull request #12185: URL: https://github.com/apache/beam/pull/12185#issuecomment-655823773 @aaltay: portable_runner depends on these optimizations (when --pre_optimize=all) is set to get better fusion. @yifanmai could you file a jira to run Dataflow through this optimization as well? (Probably doesn't makes sense to have it as part of this PR.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12203: Bq main sink
pabloem commented on pull request #12203: URL: https://github.com/apache/beam/pull/12203#issuecomment-655823253 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases
robertwb commented on a change in pull request #12185: URL: https://github.com/apache/beam/pull/12185#discussion_r451894237 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -289,6 +289,8 @@ def create_stages( phases=[ translations.annotate_downstream_side_inputs, translations.fix_side_input_pcoll_coders, +translations.eliminate_common_siblings, Review comment: We could be able to annotate/recognize certain DoFns (e.g. by URN) for which this deduplication would be safe to apply. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12203: Bq main sink
pabloem commented on pull request #12203: URL: https://github.com/apache/beam/pull/12203#issuecomment-655821747 Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem opened a new pull request #12203: Bq main sink
pabloem opened a new pull request #12203: URL: https://github.com/apache/beam/pull/12203 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam
[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
tvalentyn commented on pull request #12194: URL: https://github.com/apache/beam/pull/12194#issuecomment-655795921 Run Python Dataflow ValidatesContainer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
tvalentyn commented on pull request #12194: URL: https://github.com/apache/beam/pull/12194#issuecomment-655795067 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavannanavati commented on pull request #11939: [BEAM-10197] Support typehints for Python's frozenset
saavannanavati commented on pull request #11939: URL: https://github.com/apache/beam/pull/11939#issuecomment-655788900 Tests have passed. This is ready for merge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit merged pull request #11744: [BEAM-10007] Handle ValueProvider pipeline options in PortableRunner
TheNeuralBit merged pull request #11744: URL: https://github.com/apache/beam/pull/11744 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck merged pull request #12193: [BEAM-8472] get default GCP region option (Go)
lostluck merged pull request #12193: URL: https://github.com/apache/beam/pull/12193 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] sclukas77 opened a new pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers
sclukas77 opened a new pull request #12202: URL: https://github.com/apache/beam/pull/12202 Implemented SchemaIO and SchemaCapableIOProvider for Avro and Parquet, shifting logic to core Beam. Created generalized table and tableprovider wrappers in Beam SQL, implementing for Pubsub, Avro, and Parquet. R:@TheNeuralBit R:@robinyqiu Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastComplet
[GitHub] [beam] davidcavazos commented on a change in pull request #12188: [BEAM-10411] Adds an example that use Python cross-language Kafka transforms.
davidcavazos commented on a change in pull request #12188: URL: https://github.com/apache/beam/pull/12188#discussion_r451817171 ## File path: sdks/python/apache_beam/examples/kafkataxi/README.md ## @@ -0,0 +1,164 @@ + + +# Python KafkaIO Example + +This example reads from the PubSub NYC Taxi stream described +[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), writes +to a given Kafka topic and reads back from the same Kafka topic. This example +uses cross-language transforms available in +[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py). +Transforms are implemented in Java and are available +[here](https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java). + +## Prerequisites + +Install Java in your system and make sure that `java` command is available in +the environment. + +```sh +>java --version Review comment: Can we get rid of the initial `>` and the "output"? It makes it trickier to copy-paste the command. ```sh java --version ``` ## File path: sdks/python/apache_beam/examples/kafkataxi/README.md ## @@ -0,0 +1,164 @@ + + +# Python KafkaIO Example + +This example reads from the PubSub NYC Taxi stream described +[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), writes +to a given Kafka topic and reads back from the same Kafka topic. This example +uses cross-language transforms available in +[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py). +Transforms are implemented in Java and are available +[here](https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java). + +## Prerequisites + +Install Java in your system and make sure that `java` command is available in +the environment. + +```sh +>java --version +> +``` + +## Setup the Kafka cluster + +This example requires users to setup a Kafka cluster that the Beam runner +executing the pipeline has access to. There are few options. + +* For local runners that execute the pipelines in a single computer (for +example, portable DirectRunner or Spark/Flink runners in local mode), you can +setup a local Kafka cluster running in the same computer. +* For Dataflow or portable Spark/Flink in distributed mode, you can setup a Kafka +cluster in GCE. See +[here](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/dataflow/flex-templates/kafka_to_bigquery) +for step by step instructions for this. + +Let's assume that that IP address of the node running the Kafka cluster to be +`KAFKA_ADDRESS` and the port to be `9092`. + +```sh +> export BOOTSTRAP_SERVER=KAFKA_ADDRESS:9092 Review comment: Can we get rid of the `>` as well? For this command example, we could assume it's running locally. Someone not familiar with IP addresses might not know what `KAFKA_ADDRESS` means. We could mention that if you're running it in a distributed environment you'll need to replace the address with its public IP address. ```sh export BOOTSTRAP_SERVER=127.0.0.1:9092 ``` > **[edit]**: after looking below, it looks like this guide's instructions are only written in Dataflow. If that's the case, running Kafka locally won't work since Dataflow needs to reach the Kafka address. Should we take out any mention of running it locally and just assume users will run it in a distributed environment? ## File path: sdks/python/apache_beam/examples/kafkataxi/README.md ## @@ -0,0 +1,164 @@ + + +# Python KafkaIO Example + +This example reads from the PubSub NYC Taxi stream described +[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), writes +to a given Kafka topic and reads back from the same Kafka topic. This example +uses cross-language transforms available in +[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py). +Transforms are implemented in Java and are available +[here](https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java). + +## Prerequisites + +Install Java in your system and make sure that `java` command is available in Review comment: Can we link to https://adoptopenjdk.net/?variant=openjdk11&jvmVariant=openj9 for installing Java? ## File path: sdks/python/apache_beam/examples/kafkataxi/README.md ## @@ -0,0 +1,164 @@ + + +# Python KafkaIO Example + +This example reads from the PubSub NYC Taxi stream described +[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), writes +to a given Kafka topic and reads back from the same Kafka topic. This example +uses cross-language transforms available in +[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py). +Transforms are implemented in Java and are availabl
[GitHub] [beam] aaltay commented on pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases
aaltay commented on pull request #12185: URL: https://github.com/apache/beam/pull/12185#issuecomment-655767371 @yifanmai - (a note from our earlier conversation). This elimination would not be safe if pardo has side effects or if it is not deterministic. For example, if pardo is written to randomly sample 10% of its input, this change will result in all siblings to produce the same exact sample. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases
aaltay commented on pull request #12185: URL: https://github.com/apache/beam/pull/12185#issuecomment-655766565 I initially thought this change would only impact fnapi runner and local execution. I assume the intention is to apply the optimization to all portable runners. I noticed that portable runner depends on fnapi_runner (https://github.com/apache/beam/blob/2dddc0c9d60315fd212a90bc2ac39fd2bcc8bf63/sdks/python/apache_beam/runners/portability/portable_runner.py#L51). So maybe this optimization applies to all portable executions. @robertwb - Do you know why portable_runner depends on fnapi_runner? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski removed a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655748822 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655751328 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655751166 Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695 @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. I don't have a clue why they were timing out on ValidateCrossLanguageFlink task. If you have further suggestions for code improvement then go ahead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655748822 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #12191: [BEAM-10419] Skip FhirIORead integration test due to flakiness
jaketf commented on pull request #12191: URL: https://github.com/apache/beam/pull/12191#issuecomment-655748545 CC: @lastomato This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidyan74 opened a new pull request #12201: [BEAM-10291] log full thread dump only when lull duration is more tha…
davidyan74 opened a new pull request #12201: URL: https://github.com/apache/beam/pull/12201 …n 20 minutes This is to be consistent with the behavior of the Java runner. See https://github.com/apache/beam/pull/12143 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_P
[GitHub] [beam] KevinGG commented on pull request #12107: Interactive Environment Inspector for messaging
KevinGG commented on pull request #12107: URL: https://github.com/apache/beam/pull/12107#issuecomment-655743269 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
tvalentyn commented on pull request #12194: URL: https://github.com/apache/beam/pull/12194#issuecomment-655734489 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch
tvalentyn commented on pull request #12194: URL: https://github.com/apache/beam/pull/12194#issuecomment-655734353 Run Python Dataflow ValidatesContainer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn merged pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.
tvalentyn merged pull request #12199: URL: https://github.com/apache/beam/pull/12199 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.
tvalentyn commented on pull request #12199: URL: https://github.com/apache/beam/pull/12199#issuecomment-655733293 Commits are cherry-picks from master, test failures are unrelated (SQL), there is one known flake (Python Precommit). Merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn merged pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.
tvalentyn merged pull request #12197: URL: https://github.com/apache/beam/pull/12197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib merged pull request #12200: [BEAM-10243] Fix and test FieldValueBuilder::withFieldValues.
ibzib merged pull request #12200: URL: https://github.com/apache/beam/pull/12200 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…
pabloem commented on pull request #12143: URL: https://github.com/apache/beam/pull/12143#issuecomment-655724129 Thanks @davidyan74 ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem merged pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…
pabloem merged pull request #12143: URL: https://github.com/apache/beam/pull/12143 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11959: refactor HCLS IO ITs to support stores in other projects
aaltay commented on pull request #11959: URL: https://github.com/apache/beam/pull/11959#issuecomment-655719524 @pabloem - Could you merge this if this looks good? I have not reviewed the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12200: [BEAM-10243] Fix and test FieldValueBuilder::withFieldValues.
TheNeuralBit commented on pull request #12200: URL: https://github.com/apache/beam/pull/12200#issuecomment-655716816 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11744: [BEAM-10007] Handle ValueProvider pipeline options in PortableRunner
TheNeuralBit commented on pull request #11744: URL: https://github.com/apache/beam/pull/11744#issuecomment-655715080 Removing the ValueProvider line didn't actually break any Google tests, but I lost my resolve to remove it. It seems likely it would break some untested behavior, e.g. display_data uses drop_defaults=True: https://github.com/apache/beam/blob/f65a18760a07a48c11fe4aff2a48a845df1f522d/sdks/python/apache_beam/options/pipeline_options.py#L328-L329 Instead I pushed d2d4ecb which makes the assertions in sdk_worker_main_test more specific. I think this is good to go assuming CI passses. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness
kennknowles commented on a change in pull request #11792: URL: https://github.com/apache/beam/pull/11792#discussion_r451779380 ## File path: runners/portability/java/build.gradle ## @@ -31,9 +45,123 @@ dependencies { compile project(path: ":sdks:java:harness", configuration: "shadow") compile library.java.vendored_grpc_1_26_0 compile library.java.slf4j_api + testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") testCompile library.java.hamcrest_core testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.slf4j_jdk14 + + validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") + validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") + validatesRunner project(path: project.path, configuration: "testRuntime") +} + + +project.evaluationDependsOn(":sdks:java:core") +project.evaluationDependsOn(":runners:core-java") + +ext.virtualenvDir = "${project.buildDir}/virtualenv" +ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid" +ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? project.property("localJobServicePortFile") : "${project.buildDir}/local_job_service_port" +ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout" + +ext.pythonSdkDir = "${project.rootDir}/sdks/python" + +void execInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + exec { +workingDir pythonSdkDir +commandLine "sh", "-c", shellCommand + } } + +void execBackgroundInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + println "execBackgroundInVirtualEnv: ${shellCommand}" + ProcessBuilder pb = new ProcessBuilder().redirectErrorStream(true).directory(new File(pythonSdkDir)).command(["sh", "-c", shellCommand]) + Process proc = pb.start(); + + // redirectIO does not work for connecting to groovy/gradle stdout + BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream())); + String line + while ((line = reader.readLine()) != null) { +println line + } + proc.waitFor(); +} + +task virtualenv { Review comment: Adding a dependency on something that _is_ ready and _is_ meant for it, but is not polished and named to make clear that it is a _logical necessity_: tech debt) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on pull request #11639: [BEAM-4440] Throw exception when file to stage is not found, instead of logging a warning
kennknowles commented on pull request #11639: URL: https://github.com/apache/beam/pull/11639#issuecomment-655712118 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia edited a comment on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL
amaliujia edited a comment on pull request #11967: URL: https://github.com/apache/beam/pull/11967#issuecomment-655707048 @darshanj I believe your implementation will crash this query (at least from ZetaSQL dialect path) ``` SELECT DISTINCT val.BYTES from (select b"1" BYTES union all select cast(NULL as bytes) union all select b"-1" union all select b"1" union all select cast(NULL as bytes)) val ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #12192: [WIP][BEAM-10420] Migrate SDF logic into PerWindowInvoker
lukecwik commented on pull request #12192: URL: https://github.com/apache/beam/pull/12192#issuecomment-655709449 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL
amaliujia commented on pull request #11967: URL: https://github.com/apache/beam/pull/11967#issuecomment-655707048 @darshanj I believe your implementation will break this query (at least from ZetaSQL dialect path) ``` SELECT DISTINCT val.BYTES from (select b"1" BYTES union all select cast(NULL as bytes) union all select b"-1" union all select b"1" union all select cast(NULL as bytes)) val ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness
kennknowles commented on a change in pull request #11792: URL: https://github.com/apache/beam/pull/11792#discussion_r451772554 ## File path: runners/portability/java/build.gradle ## @@ -31,9 +45,123 @@ dependencies { compile project(path: ":sdks:java:harness", configuration: "shadow") compile library.java.vendored_grpc_1_26_0 compile library.java.slf4j_api + testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") testCompile library.java.hamcrest_core testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.slf4j_jdk14 + + validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") + validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") + validatesRunner project(path: project.path, configuration: "testRuntime") +} + + +project.evaluationDependsOn(":sdks:java:core") +project.evaluationDependsOn(":runners:core-java") + +ext.virtualenvDir = "${project.buildDir}/virtualenv" +ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid" +ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? project.property("localJobServicePortFile") : "${project.buildDir}/local_job_service_port" +ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout" + +ext.pythonSdkDir = "${project.rootDir}/sdks/python" + +void execInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + exec { +workingDir pythonSdkDir +commandLine "sh", "-c", shellCommand + } } + +void execBackgroundInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + println "execBackgroundInVirtualEnv: ${shellCommand}" + ProcessBuilder pb = new ProcessBuilder().redirectErrorStream(true).directory(new File(pythonSdkDir)).command(["sh", "-c", shellCommand]) + Process proc = pb.start(); + + // redirectIO does not work for connecting to groovy/gradle stdout + BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream())); + String line + while ((line = reader.readLine()) != null) { +println line + } + proc.waitFor(); +} + +task virtualenv { Review comment: (I will explore this invitation once the tests are running properly) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness
kennknowles commented on a change in pull request #11792: URL: https://github.com/apache/beam/pull/11792#discussion_r451761061 ## File path: runners/portability/java/build.gradle ## @@ -31,9 +45,123 @@ dependencies { compile project(path: ":sdks:java:harness", configuration: "shadow") compile library.java.vendored_grpc_1_26_0 compile library.java.slf4j_api + testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") testCompile library.java.hamcrest_core testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.slf4j_jdk14 + + validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") + validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") + validatesRunner project(path: project.path, configuration: "testRuntime") +} + + +project.evaluationDependsOn(":sdks:java:core") +project.evaluationDependsOn(":runners:core-java") + +ext.virtualenvDir = "${project.buildDir}/virtualenv" +ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid" +ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? project.property("localJobServicePortFile") : "${project.buildDir}/local_job_service_port" +ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout" + +ext.pythonSdkDir = "${project.rootDir}/sdks/python" + +void execInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + exec { +workingDir pythonSdkDir +commandLine "sh", "-c", shellCommand + } } + +void execBackgroundInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + println "execBackgroundInVirtualEnv: ${shellCommand}" + ProcessBuilder pb = new ProcessBuilder().redirectErrorStream(true).directory(new File(pythonSdkDir)).command(["sh", "-c", shellCommand]) + Process proc = pb.start(); + + // redirectIO does not work for connecting to groovy/gradle stdout + BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream())); + String line + while ((line = reader.readLine()) != null) { +println line + } + proc.waitFor(); +} + +task virtualenv { + doLast { +exec { + commandLine "virtualenv", virtualenvDir, "--python=python3" +} +execInVirtualenv "pip", "install", "--retries", "10", "--upgrade", "tox==3.11.1", "--requirement", "${project.rootDir}/sdks/python/build-requirements.txt" +execInVirtualenv "python", "setup.py", "build", "--build-base=${buildDir}" +execInVirtualenv "pip", "install", "-e", "." + } +} + +task startLocalJobService { + dependsOn virtualenv + + doLast { +execBackgroundInVirtualenv "python", +"-m", "apache_beam.runners.portability.local_job_service_main", +"--background", +"--stdout_file=${localJobServiceStdoutFile}", +"--pid_file=${localJobServicePidFile}", +"--port_file=${localJobServicePortFile}" + +File pidFile = new File(localJobServicePidFile) +int totalSleep = 0 +while (!pidFile.exists()) { + sleep(500) Review comment: Removed. This code was left over from when I was struggling to get gradle to allow the thing to daemonize itself. ## File path: sdks/python/apache_beam/runners/portability/local_job_service_main.py ## @@ -99,11 +105,23 @@ def run(argv): options.port_file = os.path.splitext(options.pid_file)[0] + '.port' argv.append('--port_file') argv.append(options.port_file) + +if not options.stdout_file: + raise RuntimeError('--stdout_file must be specified with --background') +stdout_dest = open(options.stdout_file, mode='w') + +if options.stderr_file: + stderr_dest=open(options.stderr_file, mode='w') +else: + stderr_dest=subprocess.STDOUT + subprocess.Popen([ sys.executable, '-m', 'apache_beam.runners.portability.local_job_service_main' -] + argv) +] + argv, +stderr=stderr_dest, Review comment: I didn't read the `subprocess` code, and the docs are vague. The special `subprocess.STDOUT` token indicates that the output should be "captured" into the same file handle. Comments at https://stackoverflow.com/questions/31980411/closing-files-from-subprocess-stdout imply that closing the file is the responsibility of this process. I did not run the experiments suggested there. I also did not try to refactor this code to allow a `with` statement. ## File path: runners/portability/java/build.gradle ## @@ -31,9 +45,123 @@ dependencies { compile project(path: ":sdks:java:harness", configuration: "shadow") compile library.java.vendored_grpc_1_26_0 compile library.java.slf4j_api + testCompile project(path
[GitHub] [beam] ibzib opened a new pull request #12200: [BEAM-10243] Fix and test FieldValueBuilder::withFieldValues.
ibzib opened a new pull request #12200: URL: https://github.com/apache/beam/pull/12200 R: @reuvenlax @TheNeuralBit Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Pyt
[GitHub] [beam] robertwb commented on a change in pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases
robertwb commented on a change in pull request #12185: URL: https://github.com/apache/beam/pull/12185#discussion_r451758742 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py ## @@ -685,6 +687,264 @@ def fix_side_input_pcoll_coders(stages, pipeline_context): return stages +def eliminate_common_siblings(stages, context): + # type: (Iterable[Stage], TransformContext) -> Iterable[Stage] + """Runs common subexpression elimination for common siblings. + + If stages have common input, an identical transform, and one output each, + then all but one stages will be eliminated, and the output of the remaining + will be connected to the original output PCollections of the eliminated + stages. This elimination runs only once, not recursively, and will only + eliminate the first stage after a common input, rather than a chain of + stages. + """ + + SiblingKey = collections.namedtuple( + 'SiblingKey', ['spec_urn', 'spec_payload', 'inputs', 'environment_id']) + + def get_sibling_key(transform): +"""Returns a key that will be identical for common siblings.""" +transform_output_keys = list(transform.outputs.keys()) +# Return None as the sibling key for ineligible transforms. +if len(transform_output_keys + ) != 1 or transform.spec.urn != common_urns.primitives.PAR_DO.urn: + return None +return SiblingKey( +spec_urn=transform.spec.urn, +spec_payload=transform.spec.payload, +inputs=tuple(transform.inputs.items()), +environment_id=transform.environment_id) + + # Group stages by keys. + stages_by_sibling_key = collections.defaultdict(list) + for stage in stages: +transform = only_transform(stage.transforms) +stages_by_sibling_key[get_sibling_key(transform)].append(stage) + + # Eliminate stages and build the output PCollection remapping dictionary. + pcoll_id_remap = {} + for sibling_key, sibling_stages in stages_by_sibling_key.items(): +if sibling_key is None or len(sibling_stages) == 1: + continue +output_pcoll_ids = [ +only_element(stage.transforms[0].outputs.values()) +for stage in sibling_stages +] +to_delete_pcoll_ids = output_pcoll_ids[1:] +for to_delete_pcoll_id in to_delete_pcoll_ids: + pcoll_id_remap[to_delete_pcoll_id] = output_pcoll_ids[0] + del context.components.pcollections[to_delete_pcoll_id] +del sibling_stages[1:] + + # Yield stages while remapping output PCollections if needed. + for sibling_key, sibling_stages in stages_by_sibling_key.items(): +for stage in sibling_stages: + input_keys_to_remap = [] + for input_key, input_pcoll_id in stage.transforms[0].inputs.items(): +if input_pcoll_id in pcoll_id_remap: + input_keys_to_remap.append(input_key) + for input_key_to_remap in input_keys_to_remap: +stage.transforms[0].inputs[input_key_to_remap] = pcoll_id_remap[ +stage.transforms[0].inputs[input_key_to_remap]] + yield stage + + +def pack_combiners(stages, context): + # type: (Iterable[Stage], TransformContext) -> Iterator[Stage] + """Packs sibling CombinePerKey stages into a single CombinePerKey. + + If CombinePerKey stages have a common input, one input each, and one output + each, pack the stages into a single stage that runs all CombinePerKeys and + outputs resulting tuples to a new PCollection. A subsequent stage unpacks + tuples from this PCollection and sends them to the original output + PCollections. + """ + + class _UnpackFn(core.DoFn): +"""A DoFn that unpacks a packed to multiple tagged outputs. + +Example: + tags = (T1, T2, ...) + input = (K, (V1, V2, ...)) + output = TaggedOutput(T1, (K, V1)), TaggedOutput(T2, (K, V1)), ... +""" + +def __init__(self, tags): + self._tags = tags + +def process(self, element): + key, values = element + return [ + core.pvalue.TaggedOutput(tag, (key, value)) + for tag, value in zip(self._tags, values) + ] + + def _get_fallback_coder_id(): +return context.add_or_get_coder_id( +coders.registry.get_coder(object).to_runner_api(None)) + + def _get_component_coder_id_from_kv_coder(coder, index): +assert index < 2 +if coder.spec.urn == common_urns.coders.KV.urn and len( +coder.component_coder_ids) == 2: + return coder.component_coder_ids[index] +return _get_fallback_coder_id() + + def _get_key_coder_id_from_kv_coder(coder): +return _get_component_coder_id_from_kv_coder(coder, 0) + + def _get_value_coder_id_from_kv_coder(coder): +return _get_component_coder_id_from_kv_coder(coder, 1) + + def _try_fuse_stages(a, b): +if a.can_fuse(b, context): + return a.fuse(b) +else: + raise ValueError + + def _try_merge_environments(env1, env2): Review comment: Was this copied from above? If needed, perhaps refactor? (Similarly for try_fuse_stages.) ## F
[GitHub] [beam] tvalentyn commented on pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.
tvalentyn commented on pull request #12199: URL: https://github.com/apache/beam/pull/12199#issuecomment-655695934 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn opened a new pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.
tvalentyn opened a new pull request #12199: URL: https://github.com/apache/beam/pull/12199 This cherry-picks several commits that unblocks release validation that now includes Python 3.8 tests. Since we will run release validation on Python 3.8 as well, also adding a commit that declares Py 3.8 support on 2.23.0. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/b
[GitHub] [beam] kennknowles merged pull request #12120: [BEAM-10224] Test group by and aggregation on DATE and TIME type
kennknowles merged pull request #12120: URL: https://github.com/apache/beam/pull/12120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…
pabloem commented on pull request #12143: URL: https://github.com/apache/beam/pull/12143#issuecomment-655686932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn edited a comment on pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.
tvalentyn edited a comment on pull request #12197: URL: https://github.com/apache/beam/pull/12197#issuecomment-655686497 Thanks @lazylynx, @kamilwu, @epicfaace for helping with Python 3.8! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.
tvalentyn commented on pull request #12197: URL: https://github.com/apache/beam/pull/12197#issuecomment-655686497 R: @aaltay. Thanks @lazylynx, @kamilwu, @epicfaace for helping with Python 3.8! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] busunkim96 opened a new pull request #12198: Widen ranges for GCP libraries
busunkim96 opened a new pull request #12198: URL: https://github.com/apache/beam/pull/12198 GCP libraries follow semantic versioning, so no breaking changes are made within a major version. Loosening the range reduces the likelihood of dependency conflicts for users. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/bad
[GitHub] [beam] kennknowles commented on pull request #11639: [BEAM-4440] Throw exception when file to stage is not found, instead of logging a warning
kennknowles commented on pull request #11639: URL: https://github.com/apache/beam/pull/11639#issuecomment-655686136 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12196: [release-2.23.0][BEAM-10308] Make component ID assignments consistent across Pipeline…
TheNeuralBit commented on pull request #12196: URL: https://github.com/apache/beam/pull/12196#issuecomment-655684620 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #12193: [BEAM-8472] get default GCP region option (Go)
lostluck commented on pull request #12193: URL: https://github.com/apache/beam/pull/12193#issuecomment-655678835 Thanks for the pr! Cheers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavannanavati commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
saavannanavati commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r451733706 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -378,6 +378,56 @@ def has_simple_output_type(self): self.output_types and len(self.output_types[0]) == 1 and not self.output_types[1]) + def strip_pcoll(self): +return self.strip_pcoll_helper(self.input_types, + self._has_input_types, + {'input_types': None}, + ['apache_beam.pvalue.PBegin'], + 'An input typehint to a PTransform must be' + ' a single (or nested) type wrapped by ' + 'a PCollection or PBegin. ', + 'strip_pcoll_input()').\ +strip_pcoll_helper(self.output_types, + self.has_simple_output_type, + {'output_types': None}, + ['apache_beam.pvalue.PDone'], + 'An output typehint to a PTransform must be' + ' a single (or nested) type wrapped by ' + 'a PCollection or PDone. ', + 'strip_pcoll_output()') + + def strip_pcoll_helper( + self, + my_type,# type: any + has_my_type,# type: Callable[[], bool] + kwarg_dict, # type: Dict[str, any] + my_valid_classes, # type: List[str] + error_str, # type: str + source_str # type: str + ): +# type: (...) -> IOTypeHints + +if not has_my_type() or len(my_type[0]) != 1: + return self + +my_type = my_type[0][0] + +if isinstance(my_type, typehints.AnyTypeConstraint): + return self + +valid_classes = ['apache_beam.pvalue.PCollection'] + my_valid_classes + +if not any(valid_class in str(my_type) for valid_class in valid_classes): Review comment: I tried using `isinstance` initially but it doesn't work well with generic types Another option is to use `__origin__` but I don't know if that's fully backwards compatible Strings are a wacky solution though.. do you have any other ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #11959: refactor HCLS IO ITs to support stores in other projects
jaketf commented on pull request #11959: URL: https://github.com/apache/beam/pull/11959#issuecomment-655672949 @aaltay @pabloem checks seem good now. LMK if this needs anything else This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavannanavati commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()
saavannanavati commented on a change in pull request #12009: URL: https://github.com/apache/beam/pull/12009#discussion_r451730259 ## File path: sdks/python/apache_beam/pvalue.py ## @@ -222,7 +222,7 @@ class _InvalidUnpickledPCollection(object): pass -class PBegin(PValue): +class PBegin(PValue, Generic[T]): Review comment: Oh ok makes sense This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #12196: [release-2.23.0][BEAM-10308] Make component ID assignments consistent across Pipeline…
robertwb commented on pull request #12196: URL: https://github.com/apache/beam/pull/12196#issuecomment-655664851 I agree with the severity, especially as we'll be widely advertising cross-language with the 2.23 release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb merged pull request #11912: [BEAM-10165] Cache and return error messages on pipeline failure.
robertwb merged pull request #11912: URL: https://github.com/apache/beam/pull/11912 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #11968: [BEAM-10185] Build python wheels on GitHub Actions for Windows [dependent on BEAM-10184]
robertwb commented on pull request #11968: URL: https://github.com/apache/beam/pull/11968#issuecomment-655657299 We've been avoiding doing any cythonization at all on Windows due to issues like this due to statesampler_fast using (unavailable on Windows) posix APIs. It may be possible to skip just this file, rather than everything, on Windows. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12144: [BEAM-10395] Deduplicate uploads by destinations before uploading
pabloem commented on pull request #12144: URL: https://github.com/apache/beam/pull/12144#issuecomment-655651862 taking a look today This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…
pabloem commented on pull request #12143: URL: https://github.com/apache/beam/pull/12143#issuecomment-655643859 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidcavazos commented on pull request #12188: [BEAM-10411] Adds an example that use Python cross-language Kafka transforms.
davidcavazos commented on pull request #12188: URL: https://github.com/apache/beam/pull/12188#issuecomment-655640853 Should we have a test for this sample as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit opened a new pull request #12196: [release-2.23.0][BEAM-10308] Make component ID assignments consistent across Pipeline…
TheNeuralBit opened a new pull request #12196: URL: https://github.com/apache/beam/pull/12196 Cherry-pick #12067 into 2.23.0. The bug this fixes is not technically a regression since it existed in previous versions as well, but I think it's serious enough to merit cherry picking into 2.23.0. R: @tvalentyn CC: @chamikaramj @robertwb Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
[GitHub] [beam] tvalentyn opened a new pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.
tvalentyn opened a new pull request #12197: URL: https://github.com/apache/beam/pull/12197 Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/last
[GitHub] [beam] davidcavazos commented on a change in pull request #12188: [BEAM-10411] Adds an example that use Python cross-language Kafka transforms.
davidcavazos commented on a change in pull request #12188: URL: https://github.com/apache/beam/pull/12188#discussion_r451686296 ## File path: sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py ## @@ -0,0 +1,87 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""An example that writes to and reads from Kafka. + + This example reads from the PubSub NYC Taxi stream described in + https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon, writes to a + given Kafka topic and reads back from the same Kafka topic. + """ + +# pytype: skip-file + +from __future__ import absolute_import + +import argparse +import logging +import typing + +import apache_beam as beam +from apache_beam.io.kafka import ReadFromKafka +from apache_beam.io.kafka import WriteToKafka +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import SetupOptions +from apache_beam.options.pipeline_options import StandardOptions + + +def run(argv=None): + """Main entry point; defines and runs the wordcount pipeline.""" + parser = argparse.ArgumentParser() + parser.add_argument( + '--bootstrap_servers', + dest='bootstrap_servers', + required=True, + help='Bootstrap servers for the Kafka cluster. Should be accessible by ' + 'the runner') + parser.add_argument( + '--topic', + dest='topic', + default='kafka_taxirides_realtime', + help='Kafka topic to write to and read from') + known_args, pipeline_args = parser.parse_known_args(argv) + + pipeline_options = PipelineOptions(pipeline_args) + pipeline_options.view_as(SetupOptions).save_main_session = True + pipeline_options.view_as(StandardOptions).streaming = True + + pipeline = beam.Pipeline(options=pipeline_options) + bootstrap_servers = known_args.bootstrap_servers + _ = ( + pipeline + | beam.io.ReadFromPubSub( + topic='projects/pubsub-public-data/topics/taxirides-realtime'). + with_output_types(bytes) + | beam.Map(lambda x: (b'', x)).with_output_types( + typing.Tuple[bytes, bytes]) + | beam.WindowInto(beam.window.FixedWindows(15)) Review comment: Can we make this into either a constant or another command line argument with a default value? It's also a good idea to mention the units, are these seconds or minutes? ## File path: sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py ## @@ -0,0 +1,87 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""An example that writes to and reads from Kafka. + + This example reads from the PubSub NYC Taxi stream described in + https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon, writes to a + given Kafka topic and reads back from the same Kafka topic. + """ + +# pytype: skip-file + +from __future__ import absolute_import + +import argparse +import logging +import typing + +import apache_beam as beam +from apache_beam.io.kafka import ReadFromKafka +from apache_beam.io.kafka import WriteToKafka +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import SetupOptions +from apache_beam.options.pipeline_options import StandardOptions + + +def run(argv=None): + """Main entry point; defines and runs the wordcount pipeline.""" + parser = argparse.ArgumentParser() + parser.add_argument( + '--bootstrap_servers', + dest='bootstrap_servers', + required=True, + help='Bootstrap servers for th
[GitHub] [beam] aaltay commented on pull request #12049: [BEAM-10399] Periodic clear of GCS wheels staging bucket
aaltay commented on pull request #12049: URL: https://github.com/apache/beam/pull/12049#issuecomment-655635243 What files will be cleared this way? Release branches leave forever. Would there be other branches that this script will clean? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay merged pull request #12191: [BEAM-10419] Skip FhirIORead integration test due to flakiness
aaltay merged pull request #12191: URL: https://github.com/apache/beam/pull/12191 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jaketf commented on pull request #11959: refactor HCLS IO ITs to support stores in other projects
jaketf commented on pull request #11959: URL: https://github.com/apache/beam/pull/11959#issuecomment-655632442 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on a change in pull request #12166: [BEAM-10404] Cancel queued/running GitHub Action builds on second push to PR
aaltay commented on a change in pull request #12166: URL: https://github.com/apache/beam/pull/12166#discussion_r451063093 ## File path: .github/workflows/cancel.yml ## @@ -16,7 +16,7 @@ # under the License. name: Cancel -on: [push] +on: [push, pull_request] Review comment: > e.g. new merge commit on master appear. So it is possible to run rests without PR. I do not understand how/when this would happen. Because we do not allow direct pushes to master. And if it is already in master a PR would not be helpful. Maybe this example applies to release branches? Overall, I am fine with the change. It is ok to cancel tests when a new PR is created. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #12128: [BEAM-10417] - Move Shared object from tfx_bsl
aaltay commented on pull request #12128: URL: https://github.com/apache/beam/pull/12128#issuecomment-655629231 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lgajowy commented on a change in pull request #12117: [BEAM-10343] Add dispositions for SnowflakeIO.write
lgajowy commented on a change in pull request #12117: URL: https://github.com/apache/beam/pull/12117#discussion_r451677939 ## File path: sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeServiceImpl.java ## @@ -25,7 +27,9 @@ import java.util.function.Consumer; import java.util.stream.Collectors; import javax.sql.DataSource; +import org.apache.beam.sdk.io.snowflake.data.SnowflakeTableSchema; import org.apache.beam.sdk.io.snowflake.enums.CloudProvider; Review comment: I'm not sure, to be honest. Users should look for that information in the docs. Note that technically (not that I'd wish for this) there are ways of providing support to other cloud providers without extending that enum so looking at the endpoint is not the ultimate proof that only gcs is supported. If there's no docs or it's out of date regarding cloud providers support, the devs will still need to dig into the IO implementation details to really find out. So having this enum public is not much help imho. On the other hand, what we do now is that we leave an implementation detail (noise) that is not useful for the users in any way in code in the public API. I'd suggest hiding it and maybe expose it later if there is a good reason to do so. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695 @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. I don't have a clue why they were timing out on ValidateCrossLanguageFlink task. Now the Jdbc tests pass, but I continously have `apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types` failing. I checked the postcommit cron executions and it fails quite often. I'll trigger it from time to time to confirm whether it's a flake or not. If you have further suggestions for code improvement then go ahead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski edited a comment on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695 @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. I don't have a clue why they failed on ValidateCrossLanguageFlink. Now the Jdbc tests pass, but I continously have `apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types` failing. I checked the postcommit cron executions and it fails quite often. I'll trigger it from time to time to confirm whether it's a flake or not. If you have further suggestions for code improvement then go ahead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695 @TheNeuralBit I've moved the tests execution to python postcommit suite, like KafkaIO xlang test. Now the Jdbc tests pass, but I continously have `apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types` failing. I checked the postcommit cron executions and it fails quite often. I'll trigger it from time to time to confirm whether it's a flake or not. If you have further suggestions for code improvement then go ahead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper
piotr-szuberski commented on pull request #12145: URL: https://github.com/apache/beam/pull/12145#issuecomment-655622733 Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kamilwu commented on pull request #12132: [BEAM-10371] Run dependency check script with Python 3
kamilwu commented on pull request #12132: URL: https://github.com/apache/beam/pull/12132#issuecomment-655610830 Great! Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org