[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to 
run the tests with `--test-pipeline-options="--runner=(Flink/Spark)Runner"` 
locally as well. It's strange that on your setup you encountered a timeout.
   
   I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to 
do so (with `--test-pipeline-options="--runner=(Flink/Spark)Runner"` locally as 
well. It's strange that on your setup you encountered a timeout as on Jenkins 
everything goes as expected.
   
   I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to 
do so locally as well. It's strange that on your setup you encountered a 
timeout as on Jenkins everything goes as expected.
   
   I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite. I'm able to 
do so locally as well. It's strange that on your setup you encountered a 
timeout as on Jenkins everything goes as expected.
   
   The problem was in validatesCrossLanguageRunner task. It has more 
complicated setup (run flink/spark job-service, then run test expansion 
service, use PortableRunner) and maybe something in there caused the problem. 
But even though it passed locally on my setup (even with fresh Beam repo) so I 
wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite.
   
   The problem was in validatesCrossLanguageRunner task. It has more 
complicated setup (run flink/spark job-service, then run test expansion 
service, use PortableRunner) and maybe something in there caused the problem. 
But even though it passed locally on my setup (even with fresh Beam repo) so I 
wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite and pass.
   
   The problem was in validatesCrossLanguageRunner task. It has more 
complicated setup (run flink/spark job-service, then run test expansion 
service, use PortableRunner) and maybe something in there caused the problem. 
But even though it passed locally on my setup so I wasn't able to reconstruct 
this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite and pass.
   
   The problem was in validatesCrossLanguageRunner task. It has more 
complicated setup (run flink/spark job-service, then run test expansion 
service, use PortableRunner) and maybe something in there caused the problem. 
But even though it passed locally on my setup (even with fresh Beam repo) so I 
wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #12115: [BEAM-7672] dynamically setup acceptable wheel specs according to installed python version

2020-07-08 Thread GitBox


tvalentyn commented on a change in pull request #12115:
URL: https://github.com/apache/beam/pull/12115#discussion_r451988176



##
File path: sdks/python/container/boot.go
##
@@ -170,6 +176,28 @@ func main() {
log.Fatalf("Python exited: %v", execx.Execute("python", args...))
 }
 
+// setup wheel specs according to installed python version
+func setupAcceptableWheelSpecs() error {
+   cmd := exec.Command("python", "-V")
+   stdoutStderr, err := cmd.CombinedOutput()
+   if err != nil {
+   return err
+   }
+   re := regexp.MustCompile(`Python (\d)\.(\d).*`)
+   pyVersions := re.FindStringSubmatch(string(stdoutStderr[:]))
+   if len(pyVersions) != 3 {
+   return fmt.Errorf("cannot get parse Python version from %s", 
stdoutStderr)
+   }
+   pyVersion := fmt.Sprintf("%s%s", pyVersions[1], pyVersions[2])
+   if pyVersion == "27" {
+   acceptableWhlSpecs = append(acceptableWhlSpecs, 
"cp27-cp27mu-manylinux1_x86_64.whl")
+   } else {
+   wheelName := fmt.Sprintf("cp%s-cp%sm-manylinux1_x86_64.whl", 
pyVersion, pyVersion)

Review comment:
   Sorry for a slow response. It seems that 3.8 wheels don't follow this 
pattern, just noticing this while prepring 2.23.0 release. 3.8 wheels look as 
follows:
   
   apache_beam-2.23.0-cp38-cp38-manylinux1_x86_64.whl
   
   Not sure why the pattern has changed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655922759


   @TheNeuralBit Not a problem. Sorry for lots of changes, I still have little 
experience in submitting PRs to Beam.
   
   Those tests run on Flink and Spark in Python Postcommit suite and pass, so I 
think there is no need for the ticket.
   
   The problem was in validatesCrossLanguageRunner task. It has more 
complicated setup (run flink/spark job-service, then run test expansion 
service, use PortableRunner). But even though it passed locally on my setup so 
I wasn't able to reconstruct this issue. I added the issue:
   https://issues.apache.org/jira/browse/BEAM-10429



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test

2020-07-08 Thread GitBox


ihji commented on pull request #12060:
URL: https://github.com/apache/beam/pull/12060#issuecomment-655913859


   Run XVR_Direct PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji removed a comment on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test

2020-07-08 Thread GitBox


ihji removed a comment on pull request #12060:
URL: https://github.com/apache/beam/pull/12060#issuecomment-655913859


   Run XVR_Direct PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test

2020-07-08 Thread GitBox


ihji commented on pull request #12060:
URL: https://github.com/apache/beam/pull/12060#issuecomment-655912009


   run seed job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji removed a comment on pull request #12060: [BEAM-10218] Add FnApiRunner to cross-language validate runner test

2020-07-08 Thread GitBox


ihji removed a comment on pull request #12060:
URL: https://github.com/apache/beam/pull/12060#issuecomment-655912009


   run seed job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] purbanow commented on a change in pull request #12117: [BEAM-10343] Add dispositions for SnowflakeIO.write

2020-07-08 Thread GitBox


purbanow commented on a change in pull request #12117:
URL: https://github.com/apache/beam/pull/12117#discussion_r451973374



##
File path: 
sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeServiceImpl.java
##
@@ -25,7 +27,9 @@
 import java.util.function.Consumer;
 import java.util.stream.Collectors;
 import javax.sql.DataSource;
+import org.apache.beam.sdk.io.snowflake.data.SnowflakeTableSchema;
 import org.apache.beam.sdk.io.snowflake.enums.CloudProvider;

Review comment:
   I see your point. I even removed this enum and add following variable to 
the Service file:
   `private static final String GCS_PREFIX = "gs://";`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11979: [BEAM-9679] Add Partition task to Core Transform katas

2020-07-08 Thread GitBox


lostluck commented on pull request #11979:
URL: https://github.com/apache/beam/pull/11979#issuecomment-655910522


   That works fine. Given how mechanical the change is, it wouldn't be 
unreasonable to do the change all at once in one PR.
   
   However, I do recommend doing all the directory renames in separate commits 
from the deleting the old mods so that we can check the relatively smaller 
changes on a per commit basis (rather than just all at once). Won't matter to 
the final merge, since it will get squashed, but it will make it easier to 
review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #11979: [BEAM-9679] Add Partition task to Core Transform katas

2020-07-08 Thread GitBox


damondouglas commented on pull request #11979:
URL: https://github.com/apache/beam/pull/11979#issuecomment-655901876


   @henryken / @lostluck I've started work on consolidating the module by 
beginning work at this branch: 
https://github.com/apache/beam/compare/master...damondouglas:BEAM-10428-single-module-kata-go
 
   
   Essentially the final goal is to make one go.mod at the root of 
learning/katas/go.  So far I've only changed the Hello Beam section so you can 
preview what I intend for the rest.
   
   Would it make sense to start a PR for this branch without the intent to 
merge initially?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas edited a comment on pull request #11979: [BEAM-9679] Add Partition task to Core Transform katas

2020-07-08 Thread GitBox


damondouglas edited a comment on pull request #11979:
URL: https://github.com/apache/beam/pull/11979#issuecomment-655901876


   @henryken / @lostluck I've started work on consolidating the module by 
beginning work at this branch: 
https://github.com/apache/beam/compare/master...damondouglas:BEAM-10428-single-module-kata-go
 
   
   Essentially the final goal is to make one go.mod at the root of 
learning/katas/go.  So far I've only changed the Hello Beam section so you can 
preview what I intend for the rest.
   
   Would it make sense to start a PR for this branch without the intent to 
merge initially but until the entire learning/katas/go is refactored?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidyan74 commented on pull request #12201: [BEAM-10291] log full thread dump only when lull duration is more tha…

2020-07-08 Thread GitBox


davidyan74 commented on pull request #12201:
URL: https://github.com/apache/beam/pull/12201#issuecomment-655899856


   R: @pabloem 
   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness

2020-07-08 Thread GitBox


kennknowles commented on pull request #11792:
URL: https://github.com/apache/beam/pull/11792#issuecomment-655890489


   Found a lot of exclusions before I started getting my disks filled by the 
build & local runner.
   
   @robertwb you may be interested in the failures where it seems empty side 
inputs don't work



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12194:
URL: https://github.com/apache/beam/pull/12194#issuecomment-655850277







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


TheNeuralBit commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655841690


   Interesting. Looks like it is running fine on Python PostCommit, and I'm 
able to run it locally with the fn api runner as well:
   ```
   python setup.py nosetests --tests 
apache_beam.io.external.xlang_jdbcio_it_test
   ```
   (cc: @sclukas77 - it looks like this command should work for you now if you 
have all the dependencies built)
   
   Running on Python PostCommit is fine with me, I just wanted to make sure we 
have it running continuously _somewhere_.
   
   Can you file a jira so we don't forget to investigate the issue on 
Flink/Spark?
   
   I'd like to take another look through the code when I'm fresh tomorrow, 
sorry this is taking so long :grimacing: 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12194:
URL: https://github.com/apache/beam/pull/12194#issuecomment-655830657


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers

2020-07-08 Thread GitBox


TheNeuralBit commented on a change in pull request #12202:
URL: https://github.com/apache/beam/pull/12202#discussion_r451872831



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaCapableIOTableProviderWrapper.java
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider;
+
+import static org.apache.beam.sdk.util.RowJsonUtils.newObjectMapperWith;
+
+import com.alibaba.fastjson.JSONObject;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.schemas.io.InvalidConfigurationException;
+import org.apache.beam.sdk.schemas.io.InvalidSchemaException;
+import org.apache.beam.sdk.schemas.io.SchemaCapableIOProvider;
+import org.apache.beam.sdk.schemas.io.SchemaIO;
+import org.apache.beam.sdk.util.RowJson;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A general {@link TableProvider} for IOs for consumption by Beam SQL.
+ *
+ * Can create child classes for IOs to pass {@link 
#schemaCapableIOProvider} that is specific to
+ * the IO.
+ */
+@Internal
+@Experimental
+@AutoService(TableProvider.class)
+public abstract class SchemaCapableIOTableProviderWrapper extends 
InMemoryMetaTableProvider {

Review comment:
   ```suggestion
   abstract class SchemaCapableIOTableProviderWrapper extends 
InMemoryMetaTableProvider {
   ```
   I think this AutoService annotation is what's causing the Java PreCommit to 
fail. The `AutoService` annotation makes it so that a call 
`ServiceLoader.load(TableProvider.class)` will try to instantiate this class if 
it's in the classpath, and it's not possible to instantiate this since its 
abstract.
   
   Specifically this is the ServiceLoader call that's biting you:
   
https://github.com/apache/beam/blob/1a260cd70117b77e33442f0be8f213363a2db5fc/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java#L85-L86
   
   I think we should also make this package-private





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers

2020-07-08 Thread GitBox


TheNeuralBit commented on a change in pull request #12202:
URL: https://github.com/apache/beam/pull/12202#discussion_r451889886



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableWrapper.java
##
@@ -15,54 +15,63 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.beam.sdk.extensions.sql.meta.provider.parquet;
+package org.apache.beam.sdk.extensions.sql.meta.provider;
 
 import java.io.Serializable;
-import org.apache.avro.generic.GenericRecord;
+
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Internal;
 import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
-import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
-import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
-import org.apache.beam.sdk.io.parquet.ParquetIO;
+import org.apache.beam.sdk.extensions.sql.meta.BaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
 import org.apache.beam.sdk.options.PipelineOptions;
 import org.apache.beam.sdk.schemas.Schema;
-import org.apache.beam.sdk.schemas.utils.AvroUtils;
+import org.apache.beam.sdk.schemas.io.SchemaIO;
 import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.values.PBegin;
 import org.apache.beam.sdk.values.PCollection;
-import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.POutput;
 import org.apache.beam.sdk.values.Row;
+@Internal
+@Experimental
+/**
+ * A generalized {@link Table} for IOs to create IO readers and writers.
+ */
+public class SchemaIOTableWrapper extends BaseBeamTable implements 
Serializable {

Review comment:
   ```suggestion
   class SchemaIOTableWrapper extends BaseBeamTable implements Serializable {
   ```
   
   I think this can be package-private. It might also make sense to make it an 
inner class of `SchemaIOTableProviderWrapper`, but I'll leave that up to you

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaCapableIOTableProviderWrapper.java
##
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider;
+
+import static org.apache.beam.sdk.util.RowJsonUtils.newObjectMapperWith;
+
+import com.alibaba.fastjson.JSONObject;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.auto.service.AutoService;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.schemas.io.InvalidConfigurationException;
+import org.apache.beam.sdk.schemas.io.InvalidSchemaException;
+import org.apache.beam.sdk.schemas.io.SchemaCapableIOProvider;
+import org.apache.beam.sdk.schemas.io.SchemaIO;
+import org.apache.beam.sdk.util.RowJson;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A general {@link TableProvider} for IOs for consumption by Beam SQL.
+ *
+ * Can create child classes for IOs to pass {@link 
#schemaCapableIOProvider} that is specific to
+ * the IO.
+ */
+@Internal
+@Experimental
+@AutoService(TableProvider.class)
+public abstract class SchemaCapableIOTableProviderWrapper extends 
InMemoryMetaTableProvider {

Review comment:
   ```suggestion
   abstract class SchemaCapableIOTableProviderWrapper extends 
InMemoryMetaTableProvider {
   ```
   I think this AutoService annotation is what's causing the Java PreCommit to 
fail. The `AutoService` annotation makes it so that a call 
`ServiceLoader.load(TableProvider.class)` will try to instantiate this class if 
it's in the classpath, and it's not possible  
   
   Specifically this is the ServiceLoader call that's biting you:
   
https://github.com/apache/beam/blob/1a260cd70117b77e33442f0be8f213363a2db5fc/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamCalciteSchemaFactory.java#L85-L86
   
   I think we should also make this package-private

##
File path: 
sdks/java/core/sr

[GitHub] [beam] apalmercari commented on pull request #11447: [BEAM-9502] makes Schema UUID generation deterministic

2020-07-08 Thread GitBox


apalmercari commented on pull request #11447:
URL: https://github.com/apache/beam/pull/11447#issuecomment-655829440


   What changes are required to make it compatible with the dataflow runner 
update operation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases

2020-07-08 Thread GitBox


robertwb commented on pull request #12185:
URL: https://github.com/apache/beam/pull/12185#issuecomment-655823773


   @aaltay: portable_runner depends on these optimizations (when 
--pre_optimize=all) is set to get better fusion. 
   
   @yifanmai could you file a jira to run Dataflow through this optimization as 
well? (Probably doesn't makes sense to have it as part of this PR.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12203: Bq main sink

2020-07-08 Thread GitBox


pabloem commented on pull request #12203:
URL: https://github.com/apache/beam/pull/12203#issuecomment-655823253


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases

2020-07-08 Thread GitBox


robertwb commented on a change in pull request #12185:
URL: https://github.com/apache/beam/pull/12185#discussion_r451894237



##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
##
@@ -289,6 +289,8 @@ def create_stages(
 phases=[
 translations.annotate_downstream_side_inputs,
 translations.fix_side_input_pcoll_coders,
+translations.eliminate_common_siblings,

Review comment:
   We could be able to annotate/recognize certain DoFns (e.g. by URN) for 
which this deduplication would be safe to apply. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12203: Bq main sink

2020-07-08 Thread GitBox


pabloem commented on pull request #12203:
URL: https://github.com/apache/beam/pull/12203#issuecomment-655821747


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem opened a new pull request #12203: Bq main sink

2020-07-08 Thread GitBox


pabloem opened a new pull request #12203:
URL: https://github.com/apache/beam/pull/12203


   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam

[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12194:
URL: https://github.com/apache/beam/pull/12194#issuecomment-655795921


   Run Python Dataflow ValidatesContainer



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12194:
URL: https://github.com/apache/beam/pull/12194#issuecomment-655795067







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] saavannanavati commented on pull request #11939: [BEAM-10197] Support typehints for Python's frozenset

2020-07-08 Thread GitBox


saavannanavati commented on pull request #11939:
URL: https://github.com/apache/beam/pull/11939#issuecomment-655788900


   Tests have passed. This is ready for merge 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit merged pull request #11744: [BEAM-10007] Handle ValueProvider pipeline options in PortableRunner

2020-07-08 Thread GitBox


TheNeuralBit merged pull request #11744:
URL: https://github.com/apache/beam/pull/11744


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #12193: [BEAM-8472] get default GCP region option (Go)

2020-07-08 Thread GitBox


lostluck merged pull request #12193:
URL: https://github.com/apache/beam/pull/12193


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] sclukas77 opened a new pull request #12202: [BEAM-10407,10408] Schema Capable IO Table Provider Wrappers

2020-07-08 Thread GitBox


sclukas77 opened a new pull request #12202:
URL: https://github.com/apache/beam/pull/12202


   Implemented SchemaIO and SchemaCapableIOProvider for Avro and Parquet, 
shifting logic to core Beam. Created generalized table and tableprovider 
wrappers in Beam SQL, implementing for Pubsub, Avro, and Parquet.
   
   R:@TheNeuralBit 
   R:@robinyqiu
   
   
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastComplet

[GitHub] [beam] davidcavazos commented on a change in pull request #12188: [BEAM-10411] Adds an example that use Python cross-language Kafka transforms.

2020-07-08 Thread GitBox


davidcavazos commented on a change in pull request #12188:
URL: https://github.com/apache/beam/pull/12188#discussion_r451817171



##
File path: sdks/python/apache_beam/examples/kafkataxi/README.md
##
@@ -0,0 +1,164 @@
+
+
+# Python KafkaIO Example
+
+This example reads from the PubSub NYC Taxi stream described
+[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), 
writes
+to a given Kafka topic and reads back from the same Kafka topic. This example
+uses cross-language transforms available in
+[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py).
+Transforms are implemented in Java and are available
+[here](https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java).
+
+## Prerequisites
+
+Install Java in your system and make sure that `java` command is available in 
+the environment.
+
+```sh
+>java --version

Review comment:
   Can we get rid of the initial `>` and the "output"? It makes it trickier 
to copy-paste the command.
   
   ```sh
   java --version
   ```

##
File path: sdks/python/apache_beam/examples/kafkataxi/README.md
##
@@ -0,0 +1,164 @@
+
+
+# Python KafkaIO Example
+
+This example reads from the PubSub NYC Taxi stream described
+[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), 
writes
+to a given Kafka topic and reads back from the same Kafka topic. This example
+uses cross-language transforms available in
+[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py).
+Transforms are implemented in Java and are available
+[here](https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java).
+
+## Prerequisites
+
+Install Java in your system and make sure that `java` command is available in 
+the environment.
+
+```sh
+>java --version
+> 
+```
+
+## Setup the Kafka cluster
+
+This example requires users to setup a Kafka cluster that the Beam runner
+executing the pipeline has access to. There are few options.
+
+* For local runners that execute the pipelines in a single computer (for 
+example, portable DirectRunner or Spark/Flink runners in local mode), you can 
+setup a local Kafka cluster running in the same computer.
+* For Dataflow or portable Spark/Flink in distributed mode, you can setup a 
Kafka 
+cluster in GCE. See 
+[here](https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/dataflow/flex-templates/kafka_to_bigquery)
 
+for step by step instructions for this.
+
+Let's assume that that IP address of the node running the Kafka cluster to be 
+`KAFKA_ADDRESS` and the port to be `9092`.
+
+```sh
+> export BOOTSTRAP_SERVER=KAFKA_ADDRESS:9092

Review comment:
   Can we get rid of the `>` as well?
   
   For this command example, we could assume it's running locally. Someone not 
familiar with IP addresses might not know what `KAFKA_ADDRESS` means. We could 
mention that if you're running it in a distributed environment you'll need to 
replace the address with its public IP address.
   
   ```sh
   export BOOTSTRAP_SERVER=127.0.0.1:9092
   ```
   
   > **[edit]**: after looking below, it looks like this guide's instructions 
are only written in Dataflow. If that's the case, running Kafka locally won't 
work since Dataflow needs to reach the Kafka address. Should we take out any 
mention of running it locally and just assume users will run it in a 
distributed environment?

##
File path: sdks/python/apache_beam/examples/kafkataxi/README.md
##
@@ -0,0 +1,164 @@
+
+
+# Python KafkaIO Example
+
+This example reads from the PubSub NYC Taxi stream described
+[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), 
writes
+to a given Kafka topic and reads back from the same Kafka topic. This example
+uses cross-language transforms available in
+[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py).
+Transforms are implemented in Java and are available
+[here](https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java).
+
+## Prerequisites
+
+Install Java in your system and make sure that `java` command is available in 

Review comment:
   Can we link to 
https://adoptopenjdk.net/?variant=openjdk11&jvmVariant=openj9 for installing 
Java?

##
File path: sdks/python/apache_beam/examples/kafkataxi/README.md
##
@@ -0,0 +1,164 @@
+
+
+# Python KafkaIO Example
+
+This example reads from the PubSub NYC Taxi stream described
+[here](https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon), 
writes
+to a given Kafka topic and reads back from the same Kafka topic. This example
+uses cross-language transforms available in
+[kafka.py](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py).
+Transforms are implemented in Java and are availabl

[GitHub] [beam] aaltay commented on pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases

2020-07-08 Thread GitBox


aaltay commented on pull request #12185:
URL: https://github.com/apache/beam/pull/12185#issuecomment-655767371


   @yifanmai - (a note from our earlier conversation). This elimination would 
not be safe if pardo has side effects or if it is not deterministic. For 
example, if pardo is written to randomly sample 10% of its input, this change 
will result in all siblings to produce the same exact sample.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases

2020-07-08 Thread GitBox


aaltay commented on pull request #12185:
URL: https://github.com/apache/beam/pull/12185#issuecomment-655766565


I initially thought this change would only impact fnapi runner and local 
execution. I assume the intention is to apply the optimization to all portable 
runners. I noticed that portable runner depends on fnapi_runner 
(https://github.com/apache/beam/blob/2dddc0c9d60315fd212a90bc2ac39fd2bcc8bf63/sdks/python/apache_beam/runners/portability/portable_runner.py#L51).
 So maybe this optimization applies to all portable executions.
   
   @robertwb - Do you know why portable_runner depends on fnapi_runner?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski removed a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski removed a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655748822


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655751328


   Run XVR_Flink PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655751166


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, 
like KafkaIO xlang test. I don't have a clue why they were timing out on 
ValidateCrossLanguageFlink task.
   
   If you have further suggestions for code improvement then go ahead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655748822


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #12191: [BEAM-10419] Skip FhirIORead integration test due to flakiness

2020-07-08 Thread GitBox


jaketf commented on pull request #12191:
URL: https://github.com/apache/beam/pull/12191#issuecomment-655748545


   CC: @lastomato 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidyan74 opened a new pull request #12201: [BEAM-10291] log full thread dump only when lull duration is more tha…

2020-07-08 Thread GitBox


davidyan74 opened a new pull request #12201:
URL: https://github.com/apache/beam/pull/12201


   …n 20 minutes
   
   This is to be consistent with the behavior of the Java runner. See 
https://github.com/apache/beam/pull/12143
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_P

[GitHub] [beam] KevinGG commented on pull request #12107: Interactive Environment Inspector for messaging

2020-07-08 Thread GitBox


KevinGG commented on pull request #12107:
URL: https://github.com/apache/beam/pull/12107#issuecomment-655743269


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12194:
URL: https://github.com/apache/beam/pull/12194#issuecomment-655734489







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12194: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12194:
URL: https://github.com/apache/beam/pull/12194#issuecomment-655734353


   Run Python Dataflow ValidatesContainer



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.

2020-07-08 Thread GitBox


tvalentyn merged pull request #12199:
URL: https://github.com/apache/beam/pull/12199


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12199:
URL: https://github.com/apache/beam/pull/12199#issuecomment-655733293


   Commits are cherry-picks from master, test failures are unrelated (SQL), 
there is one known flake (Python Precommit). Merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.

2020-07-08 Thread GitBox


tvalentyn merged pull request #12197:
URL: https://github.com/apache/beam/pull/12197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib merged pull request #12200: [BEAM-10243] Fix and test FieldValueBuilder::withFieldValues.

2020-07-08 Thread GitBox


ibzib merged pull request #12200:
URL: https://github.com/apache/beam/pull/12200


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…

2020-07-08 Thread GitBox


pabloem commented on pull request #12143:
URL: https://github.com/apache/beam/pull/12143#issuecomment-655724129


   Thanks @davidyan74 !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem merged pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…

2020-07-08 Thread GitBox


pabloem merged pull request #12143:
URL: https://github.com/apache/beam/pull/12143


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11959: refactor HCLS IO ITs to support stores in other projects

2020-07-08 Thread GitBox


aaltay commented on pull request #11959:
URL: https://github.com/apache/beam/pull/11959#issuecomment-655719524


   @pabloem - Could you merge this if this looks good? I have not reviewed the 
PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12200: [BEAM-10243] Fix and test FieldValueBuilder::withFieldValues.

2020-07-08 Thread GitBox


TheNeuralBit commented on pull request #12200:
URL: https://github.com/apache/beam/pull/12200#issuecomment-655716816


   Run SQL PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11744: [BEAM-10007] Handle ValueProvider pipeline options in PortableRunner

2020-07-08 Thread GitBox


TheNeuralBit commented on pull request #11744:
URL: https://github.com/apache/beam/pull/11744#issuecomment-655715080


   Removing the ValueProvider line didn't actually break any Google tests, but 
I lost my resolve to remove it. It seems likely it would break some untested 
behavior, e.g. display_data uses drop_defaults=True:
   
https://github.com/apache/beam/blob/f65a18760a07a48c11fe4aff2a48a845df1f522d/sdks/python/apache_beam/options/pipeline_options.py#L328-L329
   
   Instead I pushed d2d4ecb which makes the assertions in sdk_worker_main_test 
more specific. I think this is good to go assuming CI passses.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness

2020-07-08 Thread GitBox


kennknowles commented on a change in pull request #11792:
URL: https://github.com/apache/beam/pull/11792#discussion_r451779380



##
File path: runners/portability/java/build.gradle
##
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path: ":runners:core-construction-java", configuration: 
"testRuntime")
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
   testCompile library.java.mockito_core
   testCompile library.java.slf4j_jdk14
+
+  validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
+  validatesRunner project(path: ":runners:core-java", configuration: 
"testRuntime")
+  validatesRunner project(path: project.path, configuration: "testRuntime")
+}
+
+
+project.evaluationDependsOn(":sdks:java:core")
+project.evaluationDependsOn(":runners:core-java")
+
+ext.virtualenvDir = "${project.buildDir}/virtualenv"
+ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid"
+ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? 
project.property("localJobServicePortFile") : 
"${project.buildDir}/local_job_service_port"
+ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout"
+
+ext.pythonSdkDir = "${project.rootDir}/sdks/python"
+
+void execInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  exec {
+workingDir pythonSdkDir
+commandLine "sh", "-c", shellCommand
+  }
 }
+
+void execBackgroundInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  println "execBackgroundInVirtualEnv: ${shellCommand}"
+  ProcessBuilder pb = new 
ProcessBuilder().redirectErrorStream(true).directory(new 
File(pythonSdkDir)).command(["sh", "-c", shellCommand])
+  Process proc = pb.start();
+
+  // redirectIO does not work for connecting to groovy/gradle stdout
+  BufferedReader reader = new BufferedReader(new 
InputStreamReader(proc.getInputStream()));
+  String line
+  while ((line = reader.readLine()) != null) {
+println line
+  }
+  proc.waitFor();
+}
+
+task virtualenv {

Review comment:
   Adding a dependency on something that _is_ ready and _is_ meant for it, 
but is not polished and named to make clear that it is a _logical necessity_: 
tech debt)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #11639: [BEAM-4440] Throw exception when file to stage is not found, instead of logging a warning

2020-07-08 Thread GitBox


kennknowles commented on pull request #11639:
URL: https://github.com/apache/beam/pull/11639#issuecomment-655712118







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia edited a comment on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL

2020-07-08 Thread GitBox


amaliujia edited a comment on pull request #11967:
URL: https://github.com/apache/beam/pull/11967#issuecomment-655707048


   @darshanj  
   
   I believe your implementation will crash this query (at least from ZetaSQL 
dialect path)
   
   ```
   SELECT DISTINCT val.BYTES
   from (select b"1" BYTES union all
 select cast(NULL as bytes) union all
 select b"-1" union all
 select b"1" union all
 select cast(NULL as bytes)) val
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #12192: [WIP][BEAM-10420] Migrate SDF logic into PerWindowInvoker

2020-07-08 Thread GitBox


lukecwik commented on pull request #12192:
URL: https://github.com/apache/beam/pull/12192#issuecomment-655709449


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11967: [BEAM-9992] | use Sets transform in BeamSQL

2020-07-08 Thread GitBox


amaliujia commented on pull request #11967:
URL: https://github.com/apache/beam/pull/11967#issuecomment-655707048


   @darshanj  
   
   I believe your implementation will break this query (at least from ZetaSQL 
dialect path)
   
   ```
   SELECT DISTINCT val.BYTES
   from (select b"1" BYTES union all
 select cast(NULL as bytes) union all
 select b"-1" union all
 select b"1" union all
 select cast(NULL as bytes)) val
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness

2020-07-08 Thread GitBox


kennknowles commented on a change in pull request #11792:
URL: https://github.com/apache/beam/pull/11792#discussion_r451772554



##
File path: runners/portability/java/build.gradle
##
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path: ":runners:core-construction-java", configuration: 
"testRuntime")
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
   testCompile library.java.mockito_core
   testCompile library.java.slf4j_jdk14
+
+  validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
+  validatesRunner project(path: ":runners:core-java", configuration: 
"testRuntime")
+  validatesRunner project(path: project.path, configuration: "testRuntime")
+}
+
+
+project.evaluationDependsOn(":sdks:java:core")
+project.evaluationDependsOn(":runners:core-java")
+
+ext.virtualenvDir = "${project.buildDir}/virtualenv"
+ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid"
+ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? 
project.property("localJobServicePortFile") : 
"${project.buildDir}/local_job_service_port"
+ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout"
+
+ext.pythonSdkDir = "${project.rootDir}/sdks/python"
+
+void execInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  exec {
+workingDir pythonSdkDir
+commandLine "sh", "-c", shellCommand
+  }
 }
+
+void execBackgroundInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  println "execBackgroundInVirtualEnv: ${shellCommand}"
+  ProcessBuilder pb = new 
ProcessBuilder().redirectErrorStream(true).directory(new 
File(pythonSdkDir)).command(["sh", "-c", shellCommand])
+  Process proc = pb.start();
+
+  // redirectIO does not work for connecting to groovy/gradle stdout
+  BufferedReader reader = new BufferedReader(new 
InputStreamReader(proc.getInputStream()));
+  String line
+  while ((line = reader.readLine()) != null) {
+println line
+  }
+  proc.waitFor();
+}
+
+task virtualenv {

Review comment:
   (I will explore this invitation once the tests are running properly)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness

2020-07-08 Thread GitBox


kennknowles commented on a change in pull request #11792:
URL: https://github.com/apache/beam/pull/11792#discussion_r451761061



##
File path: runners/portability/java/build.gradle
##
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path: ":runners:core-construction-java", configuration: 
"testRuntime")
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
   testCompile library.java.mockito_core
   testCompile library.java.slf4j_jdk14
+
+  validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
+  validatesRunner project(path: ":runners:core-java", configuration: 
"testRuntime")
+  validatesRunner project(path: project.path, configuration: "testRuntime")
+}
+
+
+project.evaluationDependsOn(":sdks:java:core")
+project.evaluationDependsOn(":runners:core-java")
+
+ext.virtualenvDir = "${project.buildDir}/virtualenv"
+ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid"
+ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? 
project.property("localJobServicePortFile") : 
"${project.buildDir}/local_job_service_port"
+ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout"
+
+ext.pythonSdkDir = "${project.rootDir}/sdks/python"
+
+void execInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  exec {
+workingDir pythonSdkDir
+commandLine "sh", "-c", shellCommand
+  }
 }
+
+void execBackgroundInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  println "execBackgroundInVirtualEnv: ${shellCommand}"
+  ProcessBuilder pb = new 
ProcessBuilder().redirectErrorStream(true).directory(new 
File(pythonSdkDir)).command(["sh", "-c", shellCommand])
+  Process proc = pb.start();
+
+  // redirectIO does not work for connecting to groovy/gradle stdout
+  BufferedReader reader = new BufferedReader(new 
InputStreamReader(proc.getInputStream()));
+  String line
+  while ((line = reader.readLine()) != null) {
+println line
+  }
+  proc.waitFor();
+}
+
+task virtualenv {
+  doLast {
+exec {
+  commandLine "virtualenv", virtualenvDir, "--python=python3"
+}
+execInVirtualenv "pip", "install", "--retries", "10", "--upgrade", 
"tox==3.11.1", "--requirement", 
"${project.rootDir}/sdks/python/build-requirements.txt"
+execInVirtualenv "python", "setup.py", "build", "--build-base=${buildDir}"
+execInVirtualenv "pip", "install", "-e", "."
+  }
+}
+
+task startLocalJobService {
+  dependsOn virtualenv
+
+  doLast {
+execBackgroundInVirtualenv "python",
+"-m", "apache_beam.runners.portability.local_job_service_main",
+"--background",
+"--stdout_file=${localJobServiceStdoutFile}",
+"--pid_file=${localJobServicePidFile}",
+"--port_file=${localJobServicePortFile}"
+
+File pidFile = new File(localJobServicePidFile)
+int totalSleep = 0
+while (!pidFile.exists()) {
+  sleep(500)

Review comment:
   Removed. This code was left over from when I was struggling to get 
gradle to allow the thing to daemonize itself.

##
File path: sdks/python/apache_beam/runners/portability/local_job_service_main.py
##
@@ -99,11 +105,23 @@ def run(argv):
   options.port_file = os.path.splitext(options.pid_file)[0] + '.port'
   argv.append('--port_file')
   argv.append(options.port_file)
+
+if not options.stdout_file:
+  raise RuntimeError('--stdout_file must be specified with --background')
+stdout_dest = open(options.stdout_file, mode='w')
+
+if options.stderr_file:
+  stderr_dest=open(options.stderr_file, mode='w')
+else:
+  stderr_dest=subprocess.STDOUT
+
 subprocess.Popen([
 sys.executable,
 '-m',
 'apache_beam.runners.portability.local_job_service_main'
-] + argv)
+] + argv,
+stderr=stderr_dest,

Review comment:
   I didn't read the `subprocess` code, and the docs are vague. The special 
`subprocess.STDOUT` token indicates that the output should be "captured" into 
the same file handle. Comments at 
https://stackoverflow.com/questions/31980411/closing-files-from-subprocess-stdout
 imply that closing the file is the responsibility of this process. I did not 
run the experiments suggested there. I also did not try to refactor this code 
to allow a `with` statement.

##
File path: runners/portability/java/build.gradle
##
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path

[GitHub] [beam] ibzib opened a new pull request #12200: [BEAM-10243] Fix and test FieldValueBuilder::withFieldValues.

2020-07-08 Thread GitBox


ibzib opened a new pull request #12200:
URL: https://github.com/apache/beam/pull/12200


   R: @reuvenlax @TheNeuralBit
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Pyt

[GitHub] [beam] robertwb commented on a change in pull request #12185: [BEAM-10409] Add combiner packing to graph optimizer phases

2020-07-08 Thread GitBox


robertwb commented on a change in pull request #12185:
URL: https://github.com/apache/beam/pull/12185#discussion_r451758742



##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py
##
@@ -685,6 +687,264 @@ def fix_side_input_pcoll_coders(stages, pipeline_context):
   return stages
 
 
+def eliminate_common_siblings(stages, context):
+  # type: (Iterable[Stage], TransformContext) -> Iterable[Stage]
+  """Runs common subexpression elimination for common siblings.
+
+  If stages have common input, an identical transform, and one output each,
+  then all but one stages will be eliminated, and the output of the remaining
+  will be connected to the original output PCollections of the eliminated
+  stages. This elimination runs only once, not recursively, and will only
+  eliminate the first stage after a common input, rather than a chain of
+  stages.
+  """
+
+  SiblingKey = collections.namedtuple(
+  'SiblingKey', ['spec_urn', 'spec_payload', 'inputs', 'environment_id'])
+
+  def get_sibling_key(transform):
+"""Returns a key that will be identical for common siblings."""
+transform_output_keys = list(transform.outputs.keys())
+# Return None as the sibling key for ineligible transforms.
+if len(transform_output_keys
+  ) != 1 or transform.spec.urn != common_urns.primitives.PAR_DO.urn:
+  return None
+return SiblingKey(
+spec_urn=transform.spec.urn,
+spec_payload=transform.spec.payload,
+inputs=tuple(transform.inputs.items()),
+environment_id=transform.environment_id)
+
+  # Group stages by keys.
+  stages_by_sibling_key = collections.defaultdict(list)
+  for stage in stages:
+transform = only_transform(stage.transforms)
+stages_by_sibling_key[get_sibling_key(transform)].append(stage)
+
+  # Eliminate stages and build the output PCollection remapping dictionary.
+  pcoll_id_remap = {}
+  for sibling_key, sibling_stages in stages_by_sibling_key.items():
+if sibling_key is None or len(sibling_stages) == 1:
+  continue
+output_pcoll_ids = [
+only_element(stage.transforms[0].outputs.values())
+for stage in sibling_stages
+]
+to_delete_pcoll_ids = output_pcoll_ids[1:]
+for to_delete_pcoll_id in to_delete_pcoll_ids:
+  pcoll_id_remap[to_delete_pcoll_id] = output_pcoll_ids[0]
+  del context.components.pcollections[to_delete_pcoll_id]
+del sibling_stages[1:]
+
+  # Yield stages while remapping output PCollections if needed.
+  for sibling_key, sibling_stages in stages_by_sibling_key.items():
+for stage in sibling_stages:
+  input_keys_to_remap = []
+  for input_key, input_pcoll_id in stage.transforms[0].inputs.items():
+if input_pcoll_id in pcoll_id_remap:
+  input_keys_to_remap.append(input_key)
+  for input_key_to_remap in input_keys_to_remap:
+stage.transforms[0].inputs[input_key_to_remap] = pcoll_id_remap[
+stage.transforms[0].inputs[input_key_to_remap]]
+  yield stage
+
+
+def pack_combiners(stages, context):
+  # type: (Iterable[Stage], TransformContext) -> Iterator[Stage]
+  """Packs sibling CombinePerKey stages into a single CombinePerKey.
+
+  If CombinePerKey stages have a common input, one input each, and one output
+  each, pack the stages into a single stage that runs all CombinePerKeys and
+  outputs resulting tuples to a new PCollection. A subsequent stage unpacks
+  tuples from this PCollection and sends them to the original output
+  PCollections.
+  """
+
+  class _UnpackFn(core.DoFn):
+"""A DoFn that unpacks a packed to multiple tagged outputs.
+
+Example:
+  tags = (T1, T2, ...)
+  input = (K, (V1, V2, ...))
+  output = TaggedOutput(T1, (K, V1)), TaggedOutput(T2, (K, V1)), ...
+"""
+
+def __init__(self, tags):
+  self._tags = tags
+
+def process(self, element):
+  key, values = element
+  return [
+  core.pvalue.TaggedOutput(tag, (key, value))
+  for tag, value in zip(self._tags, values)
+  ]
+
+  def _get_fallback_coder_id():
+return context.add_or_get_coder_id(
+coders.registry.get_coder(object).to_runner_api(None))
+
+  def _get_component_coder_id_from_kv_coder(coder, index):
+assert index < 2
+if coder.spec.urn == common_urns.coders.KV.urn and len(
+coder.component_coder_ids) == 2:
+  return coder.component_coder_ids[index]
+return _get_fallback_coder_id()
+
+  def _get_key_coder_id_from_kv_coder(coder):
+return _get_component_coder_id_from_kv_coder(coder, 0)
+
+  def _get_value_coder_id_from_kv_coder(coder):
+return _get_component_coder_id_from_kv_coder(coder, 1)
+
+  def _try_fuse_stages(a, b):
+if a.can_fuse(b, context):
+  return a.fuse(b)
+else:
+  raise ValueError
+
+  def _try_merge_environments(env1, env2):

Review comment:
   Was this copied from above? If needed, perhaps refactor? (Similarly for 
try_fuse_stages.)

##
F

[GitHub] [beam] tvalentyn commented on pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12199:
URL: https://github.com/apache/beam/pull/12199#issuecomment-655695934


   R: @aaltay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn opened a new pull request #12199: [BEAM-10423] Add Py3.8 test infrastructure to the release branch to unblock release validation.

2020-07-08 Thread GitBox


tvalentyn opened a new pull request #12199:
URL: https://github.com/apache/beam/pull/12199


   This cherry-picks several commits that unblocks release validation that now 
includes Python 3.8 tests.
   Since we will run release validation on Python 3.8 as well, also adding a 
commit that declares Py 3.8 support on 2.23.0.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/b

[GitHub] [beam] kennknowles merged pull request #12120: [BEAM-10224] Test group by and aggregation on DATE and TIME type

2020-07-08 Thread GitBox


kennknowles merged pull request #12120:
URL: https://github.com/apache/beam/pull/12120


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…

2020-07-08 Thread GitBox


pabloem commented on pull request #12143:
URL: https://github.com/apache/beam/pull/12143#issuecomment-655686932







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn edited a comment on pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.

2020-07-08 Thread GitBox


tvalentyn edited a comment on pull request #12197:
URL: https://github.com/apache/beam/pull/12197#issuecomment-655686497


   Thanks @lazylynx,  @kamilwu, @epicfaace for helping with Python 3.8! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.

2020-07-08 Thread GitBox


tvalentyn commented on pull request #12197:
URL: https://github.com/apache/beam/pull/12197#issuecomment-655686497


   R: @aaltay. Thanks @lazylynx,  @kamilwu, @epicfaace for helping with Python 
3.8! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] busunkim96 opened a new pull request #12198: Widen ranges for GCP libraries

2020-07-08 Thread GitBox


busunkim96 opened a new pull request #12198:
URL: https://github.com/apache/beam/pull/12198


   GCP libraries follow semantic versioning, so no breaking changes are made 
within a major version. Loosening the range reduces the likelihood of 
dependency conflicts for users.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/bad

[GitHub] [beam] kennknowles commented on pull request #11639: [BEAM-4440] Throw exception when file to stage is not found, instead of logging a warning

2020-07-08 Thread GitBox


kennknowles commented on pull request #11639:
URL: https://github.com/apache/beam/pull/11639#issuecomment-655686136


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12196: [release-2.23.0][BEAM-10308] Make component ID assignments consistent across Pipeline…

2020-07-08 Thread GitBox


TheNeuralBit commented on pull request #12196:
URL: https://github.com/apache/beam/pull/12196#issuecomment-655684620







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #12193: [BEAM-8472] get default GCP region option (Go)

2020-07-08 Thread GitBox


lostluck commented on pull request #12193:
URL: https://github.com/apache/beam/pull/12193#issuecomment-655678835


   Thanks for the pr! Cheers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] saavannanavati commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()

2020-07-08 Thread GitBox


saavannanavati commented on a change in pull request #12009:
URL: https://github.com/apache/beam/pull/12009#discussion_r451733706



##
File path: sdks/python/apache_beam/typehints/decorators.py
##
@@ -378,6 +378,56 @@ def has_simple_output_type(self):
 self.output_types and len(self.output_types[0]) == 1 and
 not self.output_types[1])
 
+  def strip_pcoll(self):
+return self.strip_pcoll_helper(self.input_types,
+   self._has_input_types,
+   {'input_types': None},
+   ['apache_beam.pvalue.PBegin'],
+   'An input typehint to a PTransform must be'
+   ' a single (or nested) type wrapped by '
+   'a PCollection or PBegin. ',
+   'strip_pcoll_input()').\
+strip_pcoll_helper(self.output_types,
+   self.has_simple_output_type,
+   {'output_types': None},
+   ['apache_beam.pvalue.PDone'],
+   'An output typehint to a PTransform must be'
+   ' a single (or nested) type wrapped by '
+   'a PCollection or PDone. ',
+   'strip_pcoll_output()')
+
+  def strip_pcoll_helper(
+  self,
+  my_type,# type: any
+  has_my_type,# type: Callable[[], bool]
+  kwarg_dict, # type: Dict[str, any]
+  my_valid_classes,   # type: List[str]
+  error_str,  # type: str
+  source_str  # type: str
+  ):
+# type: (...) -> IOTypeHints
+
+if not has_my_type() or len(my_type[0]) != 1:
+  return self
+
+my_type = my_type[0][0]
+
+if isinstance(my_type, typehints.AnyTypeConstraint):
+  return self
+
+valid_classes = ['apache_beam.pvalue.PCollection'] + my_valid_classes
+
+if not any(valid_class in str(my_type) for valid_class in valid_classes):

Review comment:
   I tried using `isinstance` initially but it doesn't work well with 
generic types

   Another option is to use `__origin__` but I don't know if that's fully 
backwards compatible
   
   Strings are a wacky solution though.. do you have any other ideas?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #11959: refactor HCLS IO ITs to support stores in other projects

2020-07-08 Thread GitBox


jaketf commented on pull request #11959:
URL: https://github.com/apache/beam/pull/11959#issuecomment-655672949


   @aaltay @pabloem checks seem good now. LMK if this needs anything else



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] saavannanavati commented on a change in pull request #12009: [BEAM-10258] Support type hint annotations on PTransform's expand()

2020-07-08 Thread GitBox


saavannanavati commented on a change in pull request #12009:
URL: https://github.com/apache/beam/pull/12009#discussion_r451730259



##
File path: sdks/python/apache_beam/pvalue.py
##
@@ -222,7 +222,7 @@ class _InvalidUnpickledPCollection(object):
   pass
 
 
-class PBegin(PValue):
+class PBegin(PValue, Generic[T]):

Review comment:
   Oh ok makes sense





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #12196: [release-2.23.0][BEAM-10308] Make component ID assignments consistent across Pipeline…

2020-07-08 Thread GitBox


robertwb commented on pull request #12196:
URL: https://github.com/apache/beam/pull/12196#issuecomment-655664851


   I agree with the severity, especially as we'll be widely advertising 
cross-language with the 2.23 release. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #11912: [BEAM-10165] Cache and return error messages on pipeline failure.

2020-07-08 Thread GitBox


robertwb merged pull request #11912:
URL: https://github.com/apache/beam/pull/11912


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #11968: [BEAM-10185] Build python wheels on GitHub Actions for Windows [dependent on BEAM-10184]

2020-07-08 Thread GitBox


robertwb commented on pull request #11968:
URL: https://github.com/apache/beam/pull/11968#issuecomment-655657299


   We've been avoiding doing any cythonization at all on Windows due to issues 
like this due to statesampler_fast using (unavailable on Windows) posix APIs. 
It may be possible to skip just this file, rather than everything, on Windows. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12144: [BEAM-10395] Deduplicate uploads by destinations before uploading

2020-07-08 Thread GitBox


pabloem commented on pull request #12144:
URL: https://github.com/apache/beam/pull/12144#issuecomment-655651862


   taking a look today



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12143: [BEAM-10291] Adding full thread dump upon lull detection for Dataflow…

2020-07-08 Thread GitBox


pabloem commented on pull request #12143:
URL: https://github.com/apache/beam/pull/12143#issuecomment-655643859







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidcavazos commented on pull request #12188: [BEAM-10411] Adds an example that use Python cross-language Kafka transforms.

2020-07-08 Thread GitBox


davidcavazos commented on pull request #12188:
URL: https://github.com/apache/beam/pull/12188#issuecomment-655640853


   Should we have a test for this sample as well?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit opened a new pull request #12196: [release-2.23.0][BEAM-10308] Make component ID assignments consistent across Pipeline…

2020-07-08 Thread GitBox


TheNeuralBit opened a new pull request #12196:
URL: https://github.com/apache/beam/pull/12196


   Cherry-pick #12067 into 2.23.0.
   
   The bug this fixes is not technically a regression since it existed in 
previous versions as well, but I think it's serious enough to merit cherry 
picking into 2.23.0.
   
   R: @tvalentyn 
   CC: @chamikaramj @robertwb 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 

[GitHub] [beam] tvalentyn opened a new pull request #12197: [BEAM-8494] Declare Python 3.8 support in setup.py.

2020-07-08 Thread GitBox


tvalentyn opened a new pull request #12197:
URL: https://github.com/apache/beam/pull/12197


   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/last

[GitHub] [beam] davidcavazos commented on a change in pull request #12188: [BEAM-10411] Adds an example that use Python cross-language Kafka transforms.

2020-07-08 Thread GitBox


davidcavazos commented on a change in pull request #12188:
URL: https://github.com/apache/beam/pull/12188#discussion_r451686296



##
File path: sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
##
@@ -0,0 +1,87 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""An example that writes to and reads from Kafka.
+
+ This example reads from the PubSub NYC Taxi stream described in
+ https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon, writes to a
+ given Kafka topic and reads back from the same Kafka topic.
+ """
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import typing
+
+import apache_beam as beam
+from apache_beam.io.kafka import ReadFromKafka
+from apache_beam.io.kafka import WriteToKafka
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from apache_beam.options.pipeline_options import StandardOptions
+
+
+def run(argv=None):
+  """Main entry point; defines and runs the wordcount pipeline."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+  '--bootstrap_servers',
+  dest='bootstrap_servers',
+  required=True,
+  help='Bootstrap servers for the Kafka cluster. Should be accessible by '
+  'the runner')
+  parser.add_argument(
+  '--topic',
+  dest='topic',
+  default='kafka_taxirides_realtime',
+  help='Kafka topic to write to and read from')
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  pipeline_options = PipelineOptions(pipeline_args)
+  pipeline_options.view_as(SetupOptions).save_main_session = True
+  pipeline_options.view_as(StandardOptions).streaming = True
+
+  pipeline = beam.Pipeline(options=pipeline_options)
+  bootstrap_servers = known_args.bootstrap_servers
+  _ = (
+  pipeline
+  | beam.io.ReadFromPubSub(
+  topic='projects/pubsub-public-data/topics/taxirides-realtime').
+  with_output_types(bytes)
+  | beam.Map(lambda x: (b'', x)).with_output_types(
+  typing.Tuple[bytes, bytes])
+  | beam.WindowInto(beam.window.FixedWindows(15))

Review comment:
   Can we make this into either a constant or another command line argument 
with a default value? It's also a good idea to mention the units, are these 
seconds or minutes?

##
File path: sdks/python/apache_beam/examples/kafkataxi/kafka_taxi.py
##
@@ -0,0 +1,87 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""An example that writes to and reads from Kafka.
+
+ This example reads from the PubSub NYC Taxi stream described in
+ https://github.com/googlecodelabs/cloud-dataflow-nyc-taxi-tycoon, writes to a
+ given Kafka topic and reads back from the same Kafka topic.
+ """
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import typing
+
+import apache_beam as beam
+from apache_beam.io.kafka import ReadFromKafka
+from apache_beam.io.kafka import WriteToKafka
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from apache_beam.options.pipeline_options import StandardOptions
+
+
+def run(argv=None):
+  """Main entry point; defines and runs the wordcount pipeline."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+  '--bootstrap_servers',
+  dest='bootstrap_servers',
+  required=True,
+  help='Bootstrap servers for th

[GitHub] [beam] aaltay commented on pull request #12049: [BEAM-10399] Periodic clear of GCS wheels staging bucket

2020-07-08 Thread GitBox


aaltay commented on pull request #12049:
URL: https://github.com/apache/beam/pull/12049#issuecomment-655635243


   What files will be cleared this way? Release branches leave forever. Would 
there be other branches that this script will clean?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay merged pull request #12191: [BEAM-10419] Skip FhirIORead integration test due to flakiness

2020-07-08 Thread GitBox


aaltay merged pull request #12191:
URL: https://github.com/apache/beam/pull/12191


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jaketf commented on pull request #11959: refactor HCLS IO ITs to support stores in other projects

2020-07-08 Thread GitBox


jaketf commented on pull request #11959:
URL: https://github.com/apache/beam/pull/11959#issuecomment-655632442


   Run Java PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on a change in pull request #12166: [BEAM-10404] Cancel queued/running GitHub Action builds on second push to PR

2020-07-08 Thread GitBox


aaltay commented on a change in pull request #12166:
URL: https://github.com/apache/beam/pull/12166#discussion_r451063093



##
File path: .github/workflows/cancel.yml
##
@@ -16,7 +16,7 @@
 # under the License.
 
 name: Cancel
-on: [push]
+on: [push, pull_request]

Review comment:
   > e.g. new merge commit on master appear. So it is possible to run rests 
without PR.
   
   I do not understand how/when this would happen. Because we do not allow 
direct pushes to master. And if it is already in master a PR would not be 
helpful. Maybe this example applies to release branches?
   
   Overall, I am fine with the change. It is ok to cancel tests when a new PR 
is created. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #12128: [BEAM-10417] - Move Shared object from tfx_bsl

2020-07-08 Thread GitBox


aaltay commented on pull request #12128:
URL: https://github.com/apache/beam/pull/12128#issuecomment-655629231


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lgajowy commented on a change in pull request #12117: [BEAM-10343] Add dispositions for SnowflakeIO.write

2020-07-08 Thread GitBox


lgajowy commented on a change in pull request #12117:
URL: https://github.com/apache/beam/pull/12117#discussion_r451677939



##
File path: 
sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/services/SnowflakeServiceImpl.java
##
@@ -25,7 +27,9 @@
 import java.util.function.Consumer;
 import java.util.stream.Collectors;
 import javax.sql.DataSource;
+import org.apache.beam.sdk.io.snowflake.data.SnowflakeTableSchema;
 import org.apache.beam.sdk.io.snowflake.enums.CloudProvider;

Review comment:
   I'm not sure, to be honest. Users should look for that information in 
the docs. Note that technically (not that I'd wish for this) there are ways of 
providing support to other cloud providers without extending that enum so 
looking at the endpoint is not the ultimate proof that only gcs is supported. 
If there's no docs or it's out of date regarding cloud providers support, the 
devs will still need to dig into the IO implementation details to really find 
out. So having this enum public is not much help imho.
   
   On the other hand, what we do now is that we leave an implementation detail 
(noise) that is not useful for the users in any way in code in the public API. 
I'd suggest hiding it and maybe expose it later if there is a good reason to do 
so. :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, 
like KafkaIO xlang test. I don't have a clue why they were timing out on 
ValidateCrossLanguageFlink task. Now the Jdbc tests pass, but I continously 
have 
`apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types`
 failing. I checked the postcommit cron executions and it fails quite often. 
I'll trigger it from time to time to confirm whether it's a flake or not.
   
   If you have further suggestions for code improvement then go ahead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski edited a comment on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski edited a comment on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, 
like KafkaIO xlang test. I don't have a clue why they failed on 
ValidateCrossLanguageFlink. Now the Jdbc tests pass, but I continously have 
`apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types`
 failing. I checked the postcommit cron executions and it fails quite often. 
I'll trigger it from time to time to confirm whether it's a flake or not.
   
   If you have further suggestions for code improvement then go ahead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655625695


   @TheNeuralBit I've moved the tests execution to python postcommit suite, 
like KafkaIO xlang test. Now the Jdbc tests pass, but I continously have 
`apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types`
 failing. I checked the postcommit cron executions and it fails quite often. 
I'll trigger it from time to time to confirm whether it's a flake or not.
   
   If you have further suggestions for code improvement then go ahead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on pull request #12145: [BEAM-10136] [BEAM-10135] Add JdbcIO for cross-language with python wrapper

2020-07-08 Thread GitBox


piotr-szuberski commented on pull request #12145:
URL: https://github.com/apache/beam/pull/12145#issuecomment-655622733


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #12132: [BEAM-10371] Run dependency check script with Python 3

2020-07-08 Thread GitBox


kamilwu commented on pull request #12132:
URL: https://github.com/apache/beam/pull/12132#issuecomment-655610830


   Great! Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >