[GitHub] [beam] lukecwik commented on a change in pull request #12511: Upgrade to Netty 4.1.50.Final and Netty_tcnative 2.0.31.Final

2020-08-11 Thread GitBox


lukecwik commented on a change in pull request #12511:
URL: https://github.com/apache/beam/pull/12511#discussion_r468747932



##
File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy
##
@@ -27,7 +27,7 @@ class GrpcVendoring_1_26_0 {
   static def protobuf_version = "3.11.0"
   static def grpc_version = "1.26.0"
   static def gson_version = "2.8.6"
-  static def netty_version = "4.1.42.Final"
+  static def netty_version = "4.1.50.Final"

Review comment:
   We should update the version of gRPC we are vendoring instead of 
changing the versions we are using here.

##
File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
##
@@ -581,7 +581,7 @@ class BeamModulePlugin implements Plugin {
 maven_exec_plugin: "maven-plugins:maven-exec-plugin:1.6.0",
 maven_jar_plugin : "maven-plugins:maven-jar-plugin:3.0.2",
 maven_shade_plugin   : "maven-plugins:maven-shade-plugin:3.1.0",
-maven_surefire_plugin: "maven-plugins:maven-surefire-plugin:2.21.0",
+maven_surefire_plugin: "maven-plugins:maven-surefire-plugin:3.0.0-M5",

Review comment:
   why did this need to change?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468740290



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;
+}
+
+interface IHtmlViewProps {
+  htmlProvider: IHtmlProvider;
+}
+
+interface IHtmlViewState {
+  innerHtml: string;
+  script: Array;
+}
+
+/**
+ * A common HTML viewing component that renders given HTML and executes scripts
+ * from the given provider.
+ */
+export class HtmlView extends React.Component {
+  constructor(props: IHtmlViewProps) {
+super(props);
+this.state = {
+  innerHtml: props.htmlProvider.html,
+  script: []
+};
+  }
+
+  componentDidMount(): void {
+this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000);
+  }
+
+  componentWillUnmount(): void {
+clearInterval(this._updateRenderTimerId);
+  }
+
+  updateRender(): void {
+const currentHtml = this.state.innerHtml;
+const htmlToUpdate = this.props.htmlProvider.html;
+const currentScript = this.state.script;
+const scriptToUpdate = [...this.props.htmlProvider.script];
+if (htmlToUpdate !== currentHtml) {
+  this.setState({
+innerHtml: htmlToUpdate,
+// As long as the html is updated, clear the script state.
+script: []
+  });
+}
+/* Depending on whether this iteration updates the html, the scripts
+ * are executed differently.
+ * Html updated: all scripts are new, start execution from index 0;
+ * Html not updated: only newly added scripts need to be executed.
+ */
+const currentScriptLength =
+  htmlToUpdate === currentHtml ? currentScript.length : 0;
+if (scriptToUpdate.length > currentScriptLength) {
+  this.setState(
+{
+  script: scriptToUpdate
+},
+// Executes scripts once the state is updated.
+() => {
+  for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) {
+new Function(scriptToUpdate[i])();
+  }
+}
+  );
+}
+  }
+
+  render(): React.ReactNode {
+return (
+  // This injects raw HTML fetched from kernel into JSX.
+  
+);
+  }
+
+  private _updateRenderTimerId: number;
+}
+
+/**
+ * Makes the browser support HTML import and import HTML from given hrefs if
+ * any is given.
+ *
+ * Native HTML import has been deprecated by modern browsers. To support
+ * importing reusable HTML templates, webcomponentsjs library is needed.
+ * The given hrefs will be imported once the library is loaded.
+ *
+ * Note everything is appended to head and if there are duplicated HTML
+ * imports, only the first one will take effect.
+ */
+export function importHtml(hrefs: Array): void {

Review comment:
   This is needed when initializing the lumino `Widget` during activation 
of the extension.
   
   The `importHtml` only needs to be invoked once.
   The HTML we are targeting is the one used by 
[PAIR-facets](https://pair-code.github.io/facets/).
   Once invoked, later in the view, those facets visualization sent back from 
the kernel could be rendered correctly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12526: [BEAM-10663] Disable python kafka integration tests

2020-08-11 Thread GitBox


TheNeuralBit commented on a change in pull request #12526:
URL: https://github.com/apache/beam/pull/12526#discussion_r468789977



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -94,6 +94,7 @@ def run_xlang_kafkaio(self, pipeline):
 pipeline.run(False)
 
 
+@unittest.skip('BEAM-10663')

Review comment:
   If we add BEAM-6868 it should be in a skipIf that checks for the flink 
runner





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit merged pull request #12530: [BEAM-7996] Add Python SqlTransform test that includes a MAP input and output

2020-08-11 Thread GitBox


TheNeuralBit merged pull request #12530:
URL: https://github.com/apache/beam/pull/12530


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kkucharc commented on a change in pull request #12435: [BEAM-10616] Added Python Pardo load tests for streaming on Dataflow

2020-08-11 Thread GitBox


kkucharc commented on a change in pull request #12435:
URL: https://github.com/apache/beam/pull/12435#discussion_r468807769



##
File path: .test-infra/jenkins/job_LoadTests_ParDo_Python.groovy
##
@@ -151,3 +151,35 @@ 
CronJobBuilder.cronJob('beam_LoadTests_Python_ParDo_Dataflow_Batch', 'H 13 * * *
   ]
   batchLoadTestJob(delegate, 
CommonTestProperties.TriggeringContext.POST_COMMIT)
 }
+
+def streamingLoadTestJob = { scope, triggeringContext ->
+  scope.description('Runs Python ParDo load tests on Dataflow runner in 
streaming mode')
+  commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 120)
+
+  def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', 
triggeringContext)
+  for (testConfiguration in loadTestConfigurations("streaming", datasetName)) {
+// Skipping case 2 in streaming because it timeouts. To be checked TODO: 
kkucharc
+if(testConfiguration.title != "ParDo Python Load test: 2GB 100 byte 
records 200 times") {

Review comment:
   I am totally for refactoring those LoadTestJob methods into one. Thank 
you for suggestion.
   
   As it comes to excluding the test, I am usually more for implicit pointing 
to the test that should be excluded than by pointing ex. index of array (maybe 
because that index can change in case of adding new test cases).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] emilymye removed a comment on pull request #12505: [WIP][BEAM-8106] Add version to java container image name

2020-08-11 Thread GitBox


emilymye removed a comment on pull request #12505:
URL: https://github.com/apache/beam/pull/12505#issuecomment-671632378


   Run Python2_PVR_Flink PreCommit
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] emilymye commented on pull request #12505: [WIP][BEAM-8106] Add version to java container image name

2020-08-11 Thread GitBox


emilymye commented on pull request #12505:
URL: https://github.com/apache/beam/pull/12505#issuecomment-672207756


retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12526: [BEAM-10663] Disable python kafka integration tests

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12526:
URL: https://github.com/apache/beam/pull/12526#issuecomment-672235923


   Looks like the pending jobs are actually complete, they just haven't updated 
the PR. Merging now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #12516: [BEAM-9547] Implement dataframes top, join, merge.

2020-08-11 Thread GitBox


robertwb commented on a change in pull request #12516:
URL: https://github.com/apache/beam/pull/12516#discussion_r468862256



##
File path: sdks/python/apache_beam/dataframe/frames.py
##
@@ -54,6 +54,42 @@ def agg(self, *args, **kwargs):
   'order-sensitive')
   diff = frame_base.wont_implement_method('order-sensitive')
 
+  @frame_base.args_to_kwargs(pd.Series)
+  def nlargest(self, **kwargs):
+if 'keep' in kwargs and kwargs['keep'] != 'all':
+  raise frame_base.WontImplementError('order-sensitive')
+per_partition = expressions.ComputedExpression(
+'nlargest',

Review comment:
   Good point, updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TobKed commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


TobKed commented on a change in pull request #12452:
URL: https://github.com/apache/beam/pull/12452#discussion_r468877557



##
File path: CI.md
##
@@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with 
regards of the jobs
 Those jobs often have matrix run strategy which runs several different 
variations of the jobs
 (with different platform type / Python version to run for example)
 
+### Google Cloud Platform Credentials
+
+Some of the jobs require variables stored as a [GitHub 
Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)

Review comment:
   Fixed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn edited a comment on pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


tvalentyn edited a comment on pull request #12452:
URL: https://github.com/apache/beam/pull/12452#issuecomment-672296133


   Thanks, @TobKed , I don't have additional comments. Once all reviewers, 
whose feedback you expect, give their LGTM, please squash fixup commits, and 
leave only the commits that you want to be added to the commit history.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


tvalentyn commented on pull request #12452:
URL: https://github.com/apache/beam/pull/12452#issuecomment-672296133


   Thanks, @TobKed , I don't have additional comments. Once all reviewers, 
whose feedback you expect, gave their LGTM, please squash fixup commits, and 
leave only the commits that you want to be added to the commit history.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] sclukas77 commented on a change in pull request #12498: [BEAM-10654] Implemented ExternalSchemaIOTransformRegistrar for jdbc

2020-08-11 Thread GitBox


sclukas77 commented on a change in pull request #12498:
URL: https://github.com/apache/beam/pull/12498#discussion_r468720134



##
File path: 
sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java
##
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.schemaio.expansion;
+
+import com.google.auto.service.AutoService;
+import java.io.ByteArrayInputStream;
+import java.io.InputStream;
+import java.util.Map;
+import java.util.ServiceLoader;
+import javax.annotation.Nullable;
+import org.apache.beam.model.pipeline.v1.SchemaApi;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.RowCoder;
+import org.apache.beam.sdk.expansion.ExternalTransformRegistrar;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.SchemaTranslation;
+import org.apache.beam.sdk.schemas.io.SchemaIOProvider;
+import org.apache.beam.sdk.transforms.ExternalTransformBuilder;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
+
+@Experimental(Experimental.Kind.PORTABILITY)
+@AutoService(ExternalTransformRegistrar.class)
+public class ExternalSchemaIOTransformRegistrar implements 
ExternalTransformRegistrar {
+
+  @Override
+  public Map> 
knownBuilderInstances() {
+ImmutableMap.Builder builder = ImmutableMap.builder();
+try {
+  for (SchemaIOProvider schemaIOProvider : 
ServiceLoader.load(SchemaIOProvider.class)) {
+builder.put(
+"beam:external:java:" + schemaIOProvider.identifier() + ":read:v1",
+new ReaderBuilder(schemaIOProvider));
+builder.put(
+"beam:external:java:" + schemaIOProvider.identifier() + 
":write:v1",
+new WriterBuilder(schemaIOProvider));
+  }
+} catch (Exception e) {
+  throw new RuntimeException(e.getMessage());
+}
+return builder.build();
+  }
+
+  public static class Configuration {
+String location = "";
+byte[] config = new byte[0];
+@Nullable byte[] dataSchema = null;
+
+public void setLocation(String location) {
+  this.location = location;
+}
+
+public void setConfig(byte[] config) {
+  this.config = config;
+}
+
+public void setDataSchema(byte[] dataSchema) {
+  this.dataSchema = dataSchema;
+}
+  }
+
+  @Nullable
+  private static Schema translateSchema(@Nullable byte[] schemaBytes) throws 
Exception {
+if (schemaBytes == null) {
+  return null;
+}
+SchemaApi.Schema protoSchema = SchemaApi.Schema.parseFrom(schemaBytes);
+return SchemaTranslation.schemaFromProto(protoSchema);
+  }
+
+  private static Row translateRow(byte[] rowBytes, Schema configSchema) throws 
Exception {
+RowCoder rowCoder = RowCoder.of(configSchema);
+InputStream stream = new ByteArrayInputStream(rowBytes);
+return rowCoder.decode(stream);
+  }
+
+  private static class ReaderBuilder
+  implements ExternalTransformBuilder> {
+SchemaIOProvider schemaIOProvider;
+
+ReaderBuilder(SchemaIOProvider schemaIOProvider) {
+  this.schemaIOProvider = schemaIOProvider;
+}
+
+@Override
+public PTransform> buildExternal(Configuration 
configuration) {
+  try {
+return schemaIOProvider
+.from(
+configuration.location,
+translateRow(configuration.config, 
schemaIOProvider.configurationSchema()),
+translateSchema(configuration.dataSchema))
+.buildReader();
+  } catch (Exception e) {
+throw new RuntimeException("Could not convert configuration proto to 
row or schema.");
+  }
+}
+  }
+
+  private static class WriterBuilder

Review comment:
   ACK, I will add these in.





This is an automated message from the Apache Git Service.
To respond to the 

[GitHub] [beam] mxm commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload

2020-08-11 Thread GitBox


mxm commented on pull request #12481:
URL: https://github.com/apache/beam/pull/12481#issuecomment-672083732


   Run XVR_Direct PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] danielxjd commented on pull request #12223: [Beam-4379] Make ParquetIO read splittable

2020-08-11 Thread GitBox


danielxjd commented on pull request #12223:
URL: https://github.com/apache/beam/pull/12223#issuecomment-672092490


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12481:
URL: https://github.com/apache/beam/pull/12481#issuecomment-672106456







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay merged pull request #12528: Extending archiveJunit Jenkins post-commit task with stability history

2020-08-11 Thread GitBox


aaltay merged pull request #12528:
URL: https://github.com/apache/beam/pull/12528


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz edited a comment on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


boyuanzz edited a comment on pull request #12518:
URL: https://github.com/apache/beam/pull/12518#issuecomment-672158339


   > Will it be possible to re-enable the Kafka test on direct runner?
   
   I don't think we have Kafka test on direct runner. Here is the only place I 
can find to run this test:
   
https://github.com/apache/beam/blob/12dc66995056722b8ea7b165ef872dcd81b1f54a/sdks/python/test-suites/portable/common.gradle#L197-L216



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468778647



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;

Review comment:
   Acked. Change all occurrences to `string[]`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #12521: [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest

2020-08-11 Thread GitBox


kennknowles commented on a change in pull request #12521:
URL: https://github.com/apache/beam/pull/12521#discussion_r468783212



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java
##
@@ -290,5 +329,31 @@ public void coderConsistentWithEquals() throws Exception {
 }
   }
 }
+
+@Test
+public void verifyDeterministic() throws Exception {
+  if (expectDeterministic) {
+assertDeterministic(coder);
+  } else {
+assertNonDeterministic(coder);
+  }
+}
+  }
+
+  private static void assertDeterministic(SchemaCoder coder) {

Review comment:
   There's also 
https://github.com/apache/beam/blob/0098eb6f9900849b05bfc2ec2497ed8c7d37148a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L67
   
   (coder properties are "properties" in the sense of property-driven testing a 
la QuickCheck)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12522: Fix format string in PipelineValidator

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12522:
URL: https://github.com/apache/beam/pull/12522#issuecomment-672195554


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #12516: [BEAM-9547] Implement dataframes top, join, merge.

2020-08-11 Thread GitBox


robertwb commented on pull request #12516:
URL: https://github.com/apache/beam/pull/12516#issuecomment-672277371


   > This is going to take me a while to review since I have very little Python 
experience and even less Panda experience. If you want a more thorough, faster 
review, I suggest adding someone else. Otherwise I'll make my way through the 
PR this week.
   
   This week would be fine, if you're up to it. I'm trying to spread the load 
(and knowledge) a bit around the team. Feel free to ask clarifying questions. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468764276



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;
+}
+
+interface IHtmlViewProps {
+  htmlProvider: IHtmlProvider;
+}
+
+interface IHtmlViewState {
+  innerHtml: string;
+  script: Array;
+}
+
+/**
+ * A common HTML viewing component that renders given HTML and executes scripts
+ * from the given provider.
+ */
+export class HtmlView extends React.Component {
+  constructor(props: IHtmlViewProps) {
+super(props);
+this.state = {
+  innerHtml: props.htmlProvider.html,
+  script: []
+};
+  }
+
+  componentDidMount(): void {
+this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000);
+  }
+
+  componentWillUnmount(): void {
+clearInterval(this._updateRenderTimerId);
+  }
+
+  updateRender(): void {
+const currentHtml = this.state.innerHtml;
+const htmlToUpdate = this.props.htmlProvider.html;
+const currentScript = this.state.script;
+const scriptToUpdate = [...this.props.htmlProvider.script];
+if (htmlToUpdate !== currentHtml) {
+  this.setState({
+innerHtml: htmlToUpdate,
+// As long as the html is updated, clear the script state.
+script: []
+  });
+}
+/* Depending on whether this iteration updates the html, the scripts
+ * are executed differently.
+ * Html updated: all scripts are new, start execution from index 0;
+ * Html not updated: only newly added scripts need to be executed.
+ */
+const currentScriptLength =
+  htmlToUpdate === currentHtml ? currentScript.length : 0;
+if (scriptToUpdate.length > currentScriptLength) {
+  this.setState(
+{
+  script: scriptToUpdate
+},
+// Executes scripts once the state is updated.
+() => {
+  for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) {
+new Function(scriptToUpdate[i])();
+  }
+}
+  );
+}
+  }
+
+  render(): React.ReactNode {
+return (
+  // This injects raw HTML fetched from kernel into JSX.
+  

Review comment:
   Yeah, it sounds scary :)
   
[dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml)
 is React's replacement for `innerHTML` in the browser DOM.
   And the warning about using `innerHTML` is 
[here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar 
[concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited).
   All these concerns are about serving HTML from another domain that might 
contain malicious code.
   
   Here I think it's a valid use case because the HTML and scripts are returned 
from the kernel generated by the SDK as the user executes codes in their 
notebooks. They are known and trusted.
   The purpose of this module is to render the generated visualization. Whether 
using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has 
to be rendered and the scripts have to be executed. In the end, either way 
would do the same thing.
   
   The security concern is similar to whether a notebook should execute scripts 
in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb 
file (stored as JSON text) and does not execute any script unless the notebook 
is trusted. A notebook is only trusted after the user executes `jupyter trust 
notebook.ipynb` or after the user executes the notebook in a kernel.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damgad commented on pull request #12529: Moving /tmp directory cleanup of CI workers to Inventory Jenkins job

2020-08-11 Thread GitBox


damgad commented on pull request #12529:
URL: https://github.com/apache/beam/pull/12529#issuecomment-672145185


   R: @aaltay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12518:
URL: https://github.com/apache/beam/pull/12518#issuecomment-672156135


   Will it be possible to re-enable the Kafka test on direct runner?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


boyuanzz commented on pull request #12518:
URL: https://github.com/apache/beam/pull/12518#issuecomment-672201149


   > Is bundle finalization not an issue for DirectRunner ? If so we should be 
able to run the Kafka test on DirectRunner.
   
   Within https://github.com/apache/beam/pull/12488, it should work. But I'm 
not sure whether DirectRunner expands the SDF as expected.
   cc: @lukecwik 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test

2020-08-11 Thread GitBox


ihji commented on pull request #12533:
URL: https://github.com/apache/beam/pull/12533#issuecomment-672209854


   R: @chamikaramj 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12521: [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest

2020-08-11 Thread GitBox


TheNeuralBit commented on a change in pull request #12521:
URL: https://github.com/apache/beam/pull/12521#discussion_r468822570



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java
##
@@ -290,5 +329,31 @@ public void coderConsistentWithEquals() throws Exception {
 }
   }
 }
+
+@Test
+public void verifyDeterministic() throws Exception {
+  if (expectDeterministic) {
+assertDeterministic(coder);
+  } else {
+assertNonDeterministic(coder);
+  }
+}
+  }
+
+  private static void assertDeterministic(SchemaCoder coder) {

Review comment:
   Thanks! Added a commit to use verifyDeterministic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


chamikaramj commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468830116



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
   
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L58





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


chamikaramj commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468829856



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
   Note that Java also allows users control this through a pipeline option 
and Dataflow support ask users to update this parameter as they fit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


chamikaramj commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468830450



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
   These flags for streaming inserts can be added in a separate PR though.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions

2020-08-11 Thread GitBox


pabloem commented on pull request #12489:
URL: https://github.com/apache/beam/pull/12489#issuecomment-672280905


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robinyqiu opened a new pull request #12536: [BEAM-10611] Simplification: Use new ZetaSQL API to get/create Values

2020-08-11 Thread GitBox


robinyqiu opened a new pull request #12536:
URL: https://github.com/apache/beam/pull/12536


   Simplify getting/creating ZetaSQL values using the new API.
   
   r: @apilloud @ZijieSong946 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 
Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 

[GitHub] [beam] KevinGG commented on a change in pull request #12444: Added a whitespace lint as part of python lint precommit

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12444:
URL: https://github.com/apache/beam/pull/12444#discussion_r468881675



##
File path: sdks/python/scripts/run_whitespacelint.sh
##
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+
+set -o errexit
+
+pushd ../../
+
+# Check for trailing spaces in non-code text files.
+# Currently include markdown files and gradle build files.
+echo "Running whitespacelint..."
+FILES=$(find .  \( -name '*.md' -o -name 'build.gradle' \))
+for FILE in $FILES
+do
+  whitespacelint $FILE

Review comment:
   Thanks, changing it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on a change in pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload

2020-08-11 Thread GitBox


mxm commented on a change in pull request #12481:
URL: https://github.com/apache/beam/pull/12481#discussion_r467556781



##
File path: 
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
##
@@ -1452,12 +1446,12 @@ public void populateDisplayData(DisplayData.Builder 
builder) {
   public static class Configuration {
 
 // All byte arrays are UTF-8 encoded strings
-private Iterable> producerConfig;
+private Map producerConfig;
 private String topic;
 private String keySerializer;
 private String valueSerializer;

Review comment:
   Comment should be updated (no byte arrays anymore).
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload

2020-08-11 Thread GitBox


mxm commented on pull request #12481:
URL: https://github.com/apache/beam/pull/12481#issuecomment-672083483







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on a change in pull request #12526: [BEAM-10663] Disable python kafka integration tests

2020-08-11 Thread GitBox


boyuanzz commented on a change in pull request #12526:
URL: https://github.com/apache/beam/pull/12526#discussion_r468724341



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -94,6 +94,7 @@ def run_xlang_kafkaio(self, pipeline):
 pipeline.run(False)
 
 
+@unittest.skip('BEAM-10663')

Review comment:
   ```suggestion
   @unittest.skip('BEAM-10663, BEAM-6868')
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tysonjh commented on a change in pull request #12435: [BEAM-10616] Added Python Pardo load tests for streaming on Dataflow

2020-08-11 Thread GitBox


tysonjh commented on a change in pull request #12435:
URL: https://github.com/apache/beam/pull/12435#discussion_r468749356



##
File path: sdks/python/apache_beam/testing/load_tests/pardo_test.py
##
@@ -125,7 +125,9 @@ def process(self, element, state=state_param):
 state.add(1)
 yield element
 
-if self.get_option_or_default('streaming', False):
+if self.get_option_or_default(
+'streaming',
+False) and self.pipeline.get_option('runner') == "PortableRunner":

Review comment:
   Right, I remember you mentioned there was some issue with the 
SyntheticSource that Max ran into. Ideally Synthetic source would work for all 
standard uses but I'm not familiar with the details. How big of a change would 
it be to fix SyntheticSource?
   
   I'm OK with moving forward with this change, creating a Jira issue for 
tracking the SyntheticSource improvement, and noting it here as a TODO.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #12519: [BEAM-10670] Make Read execute as a splittable DoFn by default for the Java DirectRunner.

2020-08-11 Thread GitBox


lukecwik commented on pull request #12519:
URL: https://github.com/apache/beam/pull/12519#issuecomment-672146059


   Run JavaPortabilityApiJava11 PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji opened a new pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test

2020-08-11 Thread GitBox


ihji opened a new pull request #12533:
URL: https://github.com/apache/beam/pull/12533


   Using state instead of early triggering and combine per key. This would 
reduce the flakiness in the streaming test.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] ihji commented on pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test

2020-08-11 Thread GitBox


ihji commented on pull request #12533:
URL: https://github.com/apache/beam/pull/12533#issuecomment-672209441


   Run Python 3.5 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit merged pull request #12526: [BEAM-10663] Disable python kafka integration tests

2020-08-11 Thread GitBox


TheNeuralBit merged pull request #12526:
URL: https://github.com/apache/beam/pull/12526


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on pull request #12139: [DO NOT REVIEW] Scd py bq test

2020-08-11 Thread GitBox


Ardagan commented on pull request #12139:
URL: https://github.com/apache/beam/pull/12139#issuecomment-672240516


   run python postcommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468842890



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
   Filed https://issues.apache.org/jira/browse/BEAM-10680 - thanks Cham!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12492: [BEAM-6807] Implement an Azure blobstore filesystem for Python SDK

2020-08-11 Thread GitBox


pabloem commented on pull request #12492:
URL: https://github.com/apache/beam/pull/12492#issuecomment-672273518


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468764276



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;
+}
+
+interface IHtmlViewProps {
+  htmlProvider: IHtmlProvider;
+}
+
+interface IHtmlViewState {
+  innerHtml: string;
+  script: Array;
+}
+
+/**
+ * A common HTML viewing component that renders given HTML and executes scripts
+ * from the given provider.
+ */
+export class HtmlView extends React.Component {
+  constructor(props: IHtmlViewProps) {
+super(props);
+this.state = {
+  innerHtml: props.htmlProvider.html,
+  script: []
+};
+  }
+
+  componentDidMount(): void {
+this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000);
+  }
+
+  componentWillUnmount(): void {
+clearInterval(this._updateRenderTimerId);
+  }
+
+  updateRender(): void {
+const currentHtml = this.state.innerHtml;
+const htmlToUpdate = this.props.htmlProvider.html;
+const currentScript = this.state.script;
+const scriptToUpdate = [...this.props.htmlProvider.script];
+if (htmlToUpdate !== currentHtml) {
+  this.setState({
+innerHtml: htmlToUpdate,
+// As long as the html is updated, clear the script state.
+script: []
+  });
+}
+/* Depending on whether this iteration updates the html, the scripts
+ * are executed differently.
+ * Html updated: all scripts are new, start execution from index 0;
+ * Html not updated: only newly added scripts need to be executed.
+ */
+const currentScriptLength =
+  htmlToUpdate === currentHtml ? currentScript.length : 0;
+if (scriptToUpdate.length > currentScriptLength) {
+  this.setState(
+{
+  script: scriptToUpdate
+},
+// Executes scripts once the state is updated.
+() => {
+  for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) {
+new Function(scriptToUpdate[i])();
+  }
+}
+  );
+}
+  }
+
+  render(): React.ReactNode {
+return (
+  // This injects raw HTML fetched from kernel into JSX.
+  

Review comment:
   Yeah, it sounds scary :)
   
[dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml)
 is React's replacement for `innerHTML` in the browser DOM.
   And the warning about using `innerHTML` is 
[here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar 
[concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited).
   All these concerns are about serving HTML from another domain that might 
contain malicious code.
   
   Here I think it's a valid use case because the HTML and scripts are returned 
from the kernel generated by the SDK as the user executes codes in their 
notebooks. They are known and trusted.
   The purpose for this model is to render the generated visualization. Whether 
using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has 
to be rendered and the scripts have to be executed. In the end, either way 
would do the same thing.
   
   The security concern is similar to whether a notebook should execute scripts 
in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb 
file (stored as JSON text) and does not execute any script unless the notebook 
is trusted. A notebook is only trusted after the user executes `jupyter trust 
notebook.ipynb` or after the user executes the notebook in a kernel.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #12528: Extending archiveJunit Jenkins post-commit task with stability history

2020-08-11 Thread GitBox


aaltay commented on pull request #12528:
URL: https://github.com/apache/beam/pull/12528#issuecomment-672142551


   /cc @tysonjh 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12518:
URL: https://github.com/apache/beam/pull/12518#issuecomment-672181034


   Ah ok. Maybe we should just change that suite to use the DirectRunner? I'm 
not sure why it's using Flink
   
   CC: @chamikaramj, @ihji in case they know



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


chamikaramj commented on pull request #12518:
URL: https://github.com/apache/beam/pull/12518#issuecomment-672191125


   Is bundle finalization not an issue for DirectRunner ? If so we should be 
able to run the Kafka test on DirectRunner.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12489:
URL: https://github.com/apache/beam/pull/12489#discussion_r468825940



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
   yeah, this needs to fix conflicts with them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12489:
URL: https://github.com/apache/beam/pull/12489#discussion_r468843122



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1048,6 +1055,11 @@ def __init__(
 to be passed when creating a BigQuery table. These are passed when
 triggering a load job for FILE_LOADS, and when creating a new table for
 STREAMING_INSERTS.
+  with_insert_ids: When using the STREAMING_INSERTS method to write data to

Review comment:
   I've renamed the option to match the name in Java SDK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tysonjh commented on a change in pull request #12516: [BEAM-9547] Implement dataframes top, join, merge.

2020-08-11 Thread GitBox


tysonjh commented on a change in pull request #12516:
URL: https://github.com/apache/beam/pull/12516#discussion_r468707336



##
File path: sdks/python/apache_beam/dataframe/frames.py
##
@@ -54,6 +54,42 @@ def agg(self, *args, **kwargs):
   'order-sensitive')
   diff = frame_base.wont_implement_method('order-sensitive')
 
+  @frame_base.args_to_kwargs(pd.Series)
+  def nlargest(self, **kwargs):
+if 'keep' in kwargs and kwargs['keep'] != 'all':
+  raise frame_base.WontImplementError('order-sensitive')
+per_partition = expressions.ComputedExpression(
+'nlargest',

Review comment:
   Would using 'nlargest' for the name of both the per-partition and global 
computed expression cause confusion for debugging or UX?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468740290



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;
+}
+
+interface IHtmlViewProps {
+  htmlProvider: IHtmlProvider;
+}
+
+interface IHtmlViewState {
+  innerHtml: string;
+  script: Array;
+}
+
+/**
+ * A common HTML viewing component that renders given HTML and executes scripts
+ * from the given provider.
+ */
+export class HtmlView extends React.Component {
+  constructor(props: IHtmlViewProps) {
+super(props);
+this.state = {
+  innerHtml: props.htmlProvider.html,
+  script: []
+};
+  }
+
+  componentDidMount(): void {
+this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000);
+  }
+
+  componentWillUnmount(): void {
+clearInterval(this._updateRenderTimerId);
+  }
+
+  updateRender(): void {
+const currentHtml = this.state.innerHtml;
+const htmlToUpdate = this.props.htmlProvider.html;
+const currentScript = this.state.script;
+const scriptToUpdate = [...this.props.htmlProvider.script];
+if (htmlToUpdate !== currentHtml) {
+  this.setState({
+innerHtml: htmlToUpdate,
+// As long as the html is updated, clear the script state.
+script: []
+  });
+}
+/* Depending on whether this iteration updates the html, the scripts
+ * are executed differently.
+ * Html updated: all scripts are new, start execution from index 0;
+ * Html not updated: only newly added scripts need to be executed.
+ */
+const currentScriptLength =
+  htmlToUpdate === currentHtml ? currentScript.length : 0;
+if (scriptToUpdate.length > currentScriptLength) {
+  this.setState(
+{
+  script: scriptToUpdate
+},
+// Executes scripts once the state is updated.
+() => {
+  for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) {
+new Function(scriptToUpdate[i])();
+  }
+}
+  );
+}
+  }
+
+  render(): React.ReactNode {
+return (
+  // This injects raw HTML fetched from kernel into JSX.
+  
+);
+  }
+
+  private _updateRenderTimerId: number;
+}
+
+/**
+ * Makes the browser support HTML import and import HTML from given hrefs if
+ * any is given.
+ *
+ * Native HTML import has been deprecated by modern browsers. To support
+ * importing reusable HTML templates, webcomponentsjs library is needed.
+ * The given hrefs will be imported once the library is loaded.
+ *
+ * Note everything is appended to head and if there are duplicated HTML
+ * imports, only the first one will take effect.
+ */
+export function importHtml(hrefs: Array): void {

Review comment:
   This is needed when initializing the `Widget` during activation of the 
extension.
   
   The `importHtml` only needs to be invoked once.
   The HTML we are targeting is the one used by 
[PAIR-facets](https://pair-code.github.io/facets/).
   Once invoked, later in the view, those facets visualization sent back from 
the kernel could be rendered correctly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468764276



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;
+}
+
+interface IHtmlViewProps {
+  htmlProvider: IHtmlProvider;
+}
+
+interface IHtmlViewState {
+  innerHtml: string;
+  script: Array;
+}
+
+/**
+ * A common HTML viewing component that renders given HTML and executes scripts
+ * from the given provider.
+ */
+export class HtmlView extends React.Component {
+  constructor(props: IHtmlViewProps) {
+super(props);
+this.state = {
+  innerHtml: props.htmlProvider.html,
+  script: []
+};
+  }
+
+  componentDidMount(): void {
+this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000);
+  }
+
+  componentWillUnmount(): void {
+clearInterval(this._updateRenderTimerId);
+  }
+
+  updateRender(): void {
+const currentHtml = this.state.innerHtml;
+const htmlToUpdate = this.props.htmlProvider.html;
+const currentScript = this.state.script;
+const scriptToUpdate = [...this.props.htmlProvider.script];
+if (htmlToUpdate !== currentHtml) {
+  this.setState({
+innerHtml: htmlToUpdate,
+// As long as the html is updated, clear the script state.
+script: []
+  });
+}
+/* Depending on whether this iteration updates the html, the scripts
+ * are executed differently.
+ * Html updated: all scripts are new, start execution from index 0;
+ * Html not updated: only newly added scripts need to be executed.
+ */
+const currentScriptLength =
+  htmlToUpdate === currentHtml ? currentScript.length : 0;
+if (scriptToUpdate.length > currentScriptLength) {
+  this.setState(
+{
+  script: scriptToUpdate
+},
+// Executes scripts once the state is updated.
+() => {
+  for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) {
+new Function(scriptToUpdate[i])();
+  }
+}
+  );
+}
+  }
+
+  render(): React.ReactNode {
+return (
+  // This injects raw HTML fetched from kernel into JSX.
+  

Review comment:
   Yeah, it sounds scary :)
   
[dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml)
 is React's replacement for `innerHTML` in the browser DOM.
   And the warning about using `innerHTML` is 
[here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar 
[concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited).
   All these concerns are about serving HTML from another domain that might 
contain malicious code.
   
   Here I think it's a valid use case because the HTML and scripts are returned 
from the kernel generated by the SDK as the user executes codes in their 
notebooks. They are known and trusted.
   The purpose of this module is to render the generated visualization. Whether 
using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has 
to be rendered and the scripts have to be executed. In the end, either way 
would do the same thing.
   
   The security concern is similar to whether a notebook should execute scripts 
in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb 
file (stored as JSON text, identical to this module getting those JSON fields 
through kernel messaging) and does not execute any script unless the notebook 
is trusted. A notebook is only trusted after the user executes `jupyter trust 
notebook.ipynb` or after the user executes the notebook in a kernel (identical 
to the extension's usage).
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact 

[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module

2020-08-11 Thread GitBox


KevinGG commented on a change in pull request #12460:
URL: https://github.com/apache/beam/pull/12460#discussion_r468764276



##
File path: 
sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx
##
@@ -0,0 +1,119 @@
+// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
+// use this file except in compliance with the License. You may obtain a copy 
of
+// the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations 
under
+// the License.
+
+import * as React from 'react';
+
+export interface IHtmlProvider {
+  readonly html: string;
+  readonly script: Array;
+}
+
+interface IHtmlViewProps {
+  htmlProvider: IHtmlProvider;
+}
+
+interface IHtmlViewState {
+  innerHtml: string;
+  script: Array;
+}
+
+/**
+ * A common HTML viewing component that renders given HTML and executes scripts
+ * from the given provider.
+ */
+export class HtmlView extends React.Component {
+  constructor(props: IHtmlViewProps) {
+super(props);
+this.state = {
+  innerHtml: props.htmlProvider.html,
+  script: []
+};
+  }
+
+  componentDidMount(): void {
+this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000);
+  }
+
+  componentWillUnmount(): void {
+clearInterval(this._updateRenderTimerId);
+  }
+
+  updateRender(): void {
+const currentHtml = this.state.innerHtml;
+const htmlToUpdate = this.props.htmlProvider.html;
+const currentScript = this.state.script;
+const scriptToUpdate = [...this.props.htmlProvider.script];
+if (htmlToUpdate !== currentHtml) {
+  this.setState({
+innerHtml: htmlToUpdate,
+// As long as the html is updated, clear the script state.
+script: []
+  });
+}
+/* Depending on whether this iteration updates the html, the scripts
+ * are executed differently.
+ * Html updated: all scripts are new, start execution from index 0;
+ * Html not updated: only newly added scripts need to be executed.
+ */
+const currentScriptLength =
+  htmlToUpdate === currentHtml ? currentScript.length : 0;
+if (scriptToUpdate.length > currentScriptLength) {
+  this.setState(
+{
+  script: scriptToUpdate
+},
+// Executes scripts once the state is updated.
+() => {
+  for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) {
+new Function(scriptToUpdate[i])();
+  }
+}
+  );
+}
+  }
+
+  render(): React.ReactNode {
+return (
+  // This injects raw HTML fetched from kernel into JSX.
+  

Review comment:
   Yeah, it sounds scary :)
   
[dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml)
 is React's replacement for `innerHTML` in the browser DOM.
   And the warning about using `innerHTML` is 
[here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar 
[concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited).
   All these concerns are about serving HTML from another domain that might 
contain malicious code.
   
   Here I think it's a valid use case because the HTML and scripts are returned 
from the kernel generated by the SDK as the user executes codes in their 
notebooks. They are known and trusted.
   The purpose of this module is to render the generated visualization. Whether 
using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has 
to be rendered and the scripts have to be executed. In the end, either way 
would do the same thing.
   
   The security concern is similar to whether a notebook should execute scripts 
in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb 
file (stored as JSON text, identical to this module getting those JSON fields 
through kernel messaging) and does not execute any script unless the notebook 
is trusted. A notebook is only trusted after the user executes `jupyter trust 
notebook.ipynb` or after the user executes the notebook in a kernel.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles merged pull request #12366: [BEAM-10572] Eliminate nullability errors from :sdks:java:extensions:sql:datacatalog

2020-08-11 Thread GitBox


kennknowles merged pull request #12366:
URL: https://github.com/apache/beam/pull/12366


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468825099



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1419,7 +1448,18 @@ def __init__(
 Default is to retry always. This means that whenever there are rows
 that fail to be inserted to BigQuery, they will be retried 
indefinitely.
 Other retry strategy settings will produce a deadletter PCollection
-as output.
+as output. Appropriate values are:
+
+* `RetryStrategy.RETRY_ALWAYS`: retry all rows if
+  there are any kind of errors. Note that this will hold your pipeline
+  back if there are errors until you cancel or update it.

Review comment:
   that's correct.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12485:
URL: https://github.com/apache/beam/pull/12485#discussion_r468825677



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -304,6 +308,8 @@ def compute_table_name(row):
 NOTE: This job name template does not have backwards compatibility guarantees.
 """
 BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}"
+"""The number of shards per destination when writing via streaming inserts."""
+DEFAULT_SHARDS_PER_DESTINATION = 500

Review comment:
   we've been able to reach ~1k EPS per worker in Python. If we have 50 
shards, we'll only reach ~50k maximum. I'd like to have a larger default, so we 
don't automatically cap EPS at a very low rate.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12481:
URL: https://github.com/apache/beam/pull/12481#issuecomment-672256723


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12522: Fix format string in PipelineValidator

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12522:
URL: https://github.com/apache/beam/pull/12522#issuecomment-672263017


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem merged pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance

2020-08-11 Thread GitBox


pabloem merged pull request #12485:
URL: https://github.com/apache/beam/pull/12485


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit merged pull request #12521: [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest

2020-08-11 Thread GitBox


TheNeuralBit merged pull request #12521:
URL: https://github.com/apache/beam/pull/12521


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji removed a comment on pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test

2020-08-11 Thread GitBox


ihji removed a comment on pull request #12533:
URL: https://github.com/apache/beam/pull/12533#issuecomment-672209441


   Run Python 3.5 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TobKed commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


TobKed commented on a change in pull request #12452:
URL: https://github.com/apache/beam/pull/12452#discussion_r468876965



##
File path: CI.md
##
@@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with 
regards of the jobs
 Those jobs often have matrix run strategy which runs several different 
variations of the jobs
 (with different platform type / Python version to run for example)
 
+### Google Cloud Platform Credentials
+
+Some of the jobs require variables stored as a [GitHub 
Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)
+to perform operations on Google Cloud Platform. Currently these jobs are 
limited to Apache repository only.
+These variables are:
+ * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository 
has it set to `apache-beam-testing`.
+ * `GCP_REGION` - Region of the bucket and dataflow jobs. Apache/Beam 
repository has it set to `us-central1`.
+ * `GCP_TESTING_BUCKET` - Name of the bucket where temporary files for 
Dataflow tests will be stored. Apache/Beam repository has it set to 
`beam-github-actions-tests`.
+ * `GCP_SA_EMAIL` - Service account email address. This is usually of the 
format `@.iam.gserviceaccount.com`.
+ * `GCP_SA_KEY` - Service account key. This key should be created and encoded 
as a Base64 string (eg. `cat my-key.json | base64` on macOS).
+
+Service Account shall have following permissions:
+ * Storage Admin (roles/storage.admin)
+ * Dataflow Admin (roles/dataflow.admin)
+
+### Workflows
+
+ Build python source distribution and wheels - 
[build_wheels.yml](.github/workflows/build_wheels.yml)
+
 | Job | Description


| Pull 
Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials |
 
|-||--|---|---|--|
+| Check are GCP variables set | Checks are GCP variables 
are set. Jobs which required them depends on the output of this job.

  | Yes 
 | Yes   | Yes   | Yes/No   
|

Review comment:
   I used `Check GCP variables` and also changed it in workflow itself. 
Looks much more cleaner and simpler.

##
File path: CI.md
##
@@ -85,16 +105,15 @@ Those jobs often have matrix run strategy which runs 
several different variation
 | List files on Google Cloud Storage Bucket   | Lists files on GCS for 
verification purpose.   

| - 
   | Yes   | Yes   | Yes
  |
 | Tag repo nightly| Tag repo with 
`nightly-master` tag if build python source distribution and python wheels 
finished successfully.  

  | -| - | Yes   | -
|
 
-### Google Cloud Platform Credentials
-
-Some of the jobs require variables stored as a [GitHub 
Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)
-to perform operations on Google Cloud Platform. Currently these jobs are 
limited to Apache repository only.
-These variables are:
- * `GCP_SA_EMAIL` - Service account email address. This is usually of the 
format `@.iam.gserviceaccount.com`.
- * `GCP_SA_KEY` - Service account key. This key should be created and encoded 
as a Base64 string (eg. `cat my-key.json | base64` on macOS).
+ Python tests - [python_tests.yml](.github/workflows/python_tests.yml)
 
-Service Account shall have following permissions:
- * Storage Object Admin (roles/storage.objectAdmin)
+| Job  | Description   
| 
Pull Request Run | Direct Push/Merge Run | Scheduled Run | 

[GitHub] [beam] TobKed commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


TobKed commented on a change in pull request #12452:
URL: https://github.com/apache/beam/pull/12452#discussion_r468877500



##
File path: CI.md
##
@@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with 
regards of the jobs
 Those jobs often have matrix run strategy which runs several different 
variations of the jobs
 (with different platform type / Python version to run for example)
 
+### Google Cloud Platform Credentials
+
+Some of the jobs require variables stored as a [GitHub 
Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)
+to perform operations on Google Cloud Platform. Currently these jobs are 
limited to Apache repository only.
+These variables are:
+ * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository 
has it set to `apache-beam-testing`.

Review comment:
   I agree. I changed it referring values as examples.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12492: [BEAM-6807] Implement an Azure blobstore filesystem for Python SDK

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12492:
URL: https://github.com/apache/beam/pull/12492#discussion_r468877314



##
File path: sdks/python/apache_beam/io/azure/blobstoragefilesystem_test.py
##
@@ -0,0 +1,315 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Unit tests for Azure Blob Storage File System."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+import unittest
+
+# patches unittest.TestCase to be python3 compatible.
+import future.tests.base  # pylint: disable=unused-import
+import mock
+
+from apache_beam.io.azure import blobstorageio

Review comment:
   the import error occurs here, so you should move this import to line 40





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #12385: [BEAM-10527] Migrate Flink and Spark tests to pytest.

2020-08-11 Thread GitBox


mxm commented on pull request #12385:
URL: https://github.com/apache/beam/pull/12385#issuecomment-672090815


   > > > I have already spent a long time trying to fix quotes, so I can't help 
but wondering: why do we need flinkCompatibilityMatrixPROCESS in the first 
place, when it is not being run anywhere? If it's important, shouldn't we add 
it to some postcommit?
   > > 
   > > 
   > > Another solution I had in mind was reworking the `--environment_config` 
option. JSON blobs are unwieldy, and overloading the `--environment_config` 
option is confusing to the user. We could make each field in the PROCESS 
`--environment_config` blob its own argument, and then reject these arguments 
when `environment_type != PROCESS`.
   > 
   > @mxm what do you think about 
https://issues.apache.org/jira/browse/BEAM-10671?
   
   I think it makes sense. Especially for error reporting by having dedicated 
argument parsers for all environments parameters.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm edited a comment on pull request #12385: [BEAM-10527] Migrate Flink and Spark tests to pytest.

2020-08-11 Thread GitBox


mxm edited a comment on pull request #12385:
URL: https://github.com/apache/beam/pull/12385#issuecomment-672090815


   > > Another solution I had in mind was reworking the `--environment_config` 
option. JSON blobs are unwieldy, and overloading the `--environment_config` 
option is confusing to the user. We could make each field in the PROCESS 
`--environment_config` blob its own argument, and then reject these arguments 
when `environment_type != PROCESS`.
   > 
   > @mxm what do you think about 
https://issues.apache.org/jira/browse/BEAM-10671?
   
   I think it makes sense. Especially for error reporting by having dedicated 
argument parsers for all environments parameters.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #12532: [Beam-9543] support MATCH_RECOGNIZE with NFA

2020-08-11 Thread GitBox


amaliujia commented on pull request #12532:
URL: https://github.com/apache/beam/pull/12532#issuecomment-672126518


   cc @aaltay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


tvalentyn commented on a change in pull request #12452:
URL: https://github.com/apache/beam/pull/12452#discussion_r468768265



##
File path: CI.md
##
@@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with 
regards of the jobs
 Those jobs often have matrix run strategy which runs several different 
variations of the jobs
 (with different platform type / Python version to run for example)
 
+### Google Cloud Platform Credentials
+
+Some of the jobs require variables stored as a [GitHub 
Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)
+to perform operations on Google Cloud Platform. Currently these jobs are 
limited to Apache repository only.
+These variables are:
+ * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository 
has it set to `apache-beam-testing`.

Review comment:
   For my education, which repository we refer to here? Also, I would not 
hard code existing values in documentation, since the source-of-truth may be 
updated, and documentation may not. You could say: `For example 
apache-beam-testing` instead.

##
File path: CI.md
##
@@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with 
regards of the jobs
 Those jobs often have matrix run strategy which runs several different 
variations of the jobs
 (with different platform type / Python version to run for example)
 
+### Google Cloud Platform Credentials
+
+Some of the jobs require variables stored as a [GitHub 
Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets)
+to perform operations on Google Cloud Platform. Currently these jobs are 
limited to Apache repository only.
+These variables are:
+ * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository 
has it set to `apache-beam-testing`.
+ * `GCP_REGION` - Region of the bucket and dataflow jobs. Apache/Beam 
repository has it set to `us-central1`.
+ * `GCP_TESTING_BUCKET` - Name of the bucket where temporary files for 
Dataflow tests will be stored. Apache/Beam repository has it set to 
`beam-github-actions-tests`.
+ * `GCP_SA_EMAIL` - Service account email address. This is usually of the 
format `@.iam.gserviceaccount.com`.
+ * `GCP_SA_KEY` - Service account key. This key should be created and encoded 
as a Base64 string (eg. `cat my-key.json | base64` on macOS).
+
+Service Account shall have following permissions:
+ * Storage Admin (roles/storage.admin)
+ * Dataflow Admin (roles/dataflow.admin)
+
+### Workflows
+
+ Build python source distribution and wheels - 
[build_wheels.yml](.github/workflows/build_wheels.yml)
+
 | Job | Description


| Pull 
Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials |
 
|-||--|---|---|--|
+| Check are GCP variables set | Checks are GCP variables 
are set. Jobs which required them depends on the output of this job.

  | Yes 
 | Yes   | Yes   | Yes/No   
|

Review comment:
   Suggestion: Consider using `Checks that GCP variables are set` or `Check 
GCP variables` throughout.

##
File path: CI.md
##
@@ -85,16 +105,15 @@ Those jobs often have matrix run strategy which runs 
several different variation
 | List files on Google Cloud Storage Bucket   | Lists files on GCS for 
verification purpose.   

| - 
   | Yes   | Yes   | Yes
  |
 | Tag repo nightly| Tag repo with 
`nightly-master` tag if build python source distribution and python wheels 
finished successfully.  

  | -| - | Yes 

[GitHub] [beam] boyuanzz commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


boyuanzz commented on pull request #12518:
URL: https://github.com/apache/beam/pull/12518#issuecomment-672158339


   > Will it be possible to re-enable the Kafka test on direct runner?
   
   I don't think we have Kafka test on direct runner.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] danielxjd removed a comment on pull request #12223: [Beam-4379] Make ParquetIO read splittable

2020-08-11 Thread GitBox


danielxjd removed a comment on pull request #12223:
URL: https://github.com/apache/beam/pull/12223#issuecomment-672092490


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on a change in pull request #12531: [BEAM-10676] Use the fire timestamp as the output timestamp for timers

2020-08-11 Thread GitBox


boyuanzz commented on a change in pull request #12531:
URL: https://github.com/apache/beam/pull/12531#discussion_r468822213



##
File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
##
@@ -647,7 +647,7 @@ def set(self, ts):
 windows=(self._window, ),
 clear_bit=False,
 fire_timestamp=ts,
-hold_timestamp=self._input_timestamp,
+hold_timestamp=ts,

Review comment:
   Can we only use fire_timestamp as hold_timestamp when in event time 
domain?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #12534: [BEAM-9547] Implement some methods for deferred Series.

2020-08-11 Thread GitBox


robertwb commented on pull request #12534:
URL: https://github.com/apache/beam/pull/12534#issuecomment-672230522


   R: @ibzib 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robinyqiu merged pull request #12515: Upgrade to ZetaSQL 2020.08.1

2020-08-11 Thread GitBox


robinyqiu merged pull request #12515:
URL: https://github.com/apache/beam/pull/12515


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12492: [BEAM-6807] Implement an Azure blobstore filesystem for Python SDK

2020-08-11 Thread GitBox


pabloem commented on a change in pull request #12492:
URL: https://github.com/apache/beam/pull/12492#discussion_r468878930



##
File path: sdks/python/apache_beam/io/azure/blobstorageio_test.py
##
@@ -0,0 +1,86 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for Azure Blob Storage client.
+"""
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+import unittest
+
+from apache_beam.io.azure import blobstorageio

Review comment:
   there's also an import error happening here. you need to catch it and 
skip the test





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] saavannanavati commented on pull request #12352: [BEAM-10549] Improve runtime type checking performance for the Python SDK

2020-08-11 Thread GitBox


saavannanavati commented on pull request #12352:
URL: https://github.com/apache/beam/pull/12352#issuecomment-672292191


   The microbenchmark results! 
   
   It looks like `performance_runtime_type_check` is  ~10% slower than no type 
check, but significantly faster than `runtime_type_check` whose time complexity 
is linear with respect to the input size.
   
   ```1 Element 
   -- Simple Types
    9.42 sec (No Type Check)
    8.01 sec (Runtime Type Check)
    7.70 sec (Performance Runtime Type Check)
   -- Nested Types
    9.10 sec (No Type Check)
    9.79 sec (Runtime Type Check)
    7.59 sec (Performance Runtime Type Check)
   
   
   1001 Elements
   -- Simple Types
    11.78 sec (No Type Check)
    65.04 sec (Runtime Type Check)
    20.00 sec (Performance Runtime Type Check)
   -- Nested Types
    20.26 sec (No Type Check)
    74.79 sec (Runtime Type Check)
    25.84 sec (Performance Runtime Type Check)
   
   
   2001 Elements
   -- Simple Types
    23.40 sec (No Type Check)
    118.87 sec (Runtime Type Check)
    25.75 sec (Performance Runtime Type Check)
   -- Nested Types
    32.52 sec (No Type Check)
    135.38 sec (Runtime Type Check)
    36.19 sec (Performance Runtime Type Check)
   
   
   3001 Elements
   -- Simple Types
    29.79 sec (No Type Check)
    174.64 sec (Runtime Type Check)
    24.85 sec (Performance Runtime Type Check)
   -- Nested Types
    42.14 sec (No Type Check)
    193.98 sec (Runtime Type Check)
    45.33 sec (Performance Runtime Type Check)
   
   
   4001 Elements
   -- Simple Types
    34.66 sec (No Type Check)
    225.57 sec (Runtime Type Check)
    37.28 sec (Performance Runtime Type Check)
   -- Nested Types
    51.87 sec (No Type Check)
    260.55 sec (Runtime Type Check)
    56.28 sec (Performance Runtime Type Check)
   
   
   5001 Elements
   -- Simple Types
    41.14 sec (No Type Check)
    277.93 sec (Runtime Type Check)
    31.16 sec (Performance Runtime Type Check)
   -- Nested Types
    63.14 sec (No Type Check)
    296.25 sec (Runtime Type Check)
    41.02 sec (Performance Runtime Type Check)
   
   
   6001 Elements
   -- Simple Types
    30.41 sec (No Type Check)
    292.06 sec (Runtime Type Check)
    50.29 sec (Performance Runtime Type Check)
   -- Nested Types
    43.29 sec (No Type Check)
    378.11 sec (Runtime Type Check)
    75.43 sec (Performance Runtime Type Check)
   
   
   7001 Elements
   -- Simple Types
    42.78 sec (No Type Check)
    288.78 sec (Runtime Type Check)
    39.87 sec (Performance Runtime Type Check)
   -- Nested Types
    54.35 sec (No Type Check)
    318.59 sec (Runtime Type Check)
    56.68 sec (Performance Runtime Type Check)
   
   
   8001 Elements
   -- Simple Types
    41.48 sec (No Type Check)
    362.18 sec (Runtime Type Check)
    62.33 sec (Performance Runtime Type Check)
   -- Nested Types
    60.65 sec (No Type Check)
    501.25 sec (Runtime Type Check)
    96.24 sec (Performance Runtime Type Check)
   
   
   9001 Elements
   -- Simple Types
    64.16 sec (No Type Check)
    370.09 sec (Runtime Type Check)
    62.71 sec (Performance Runtime Type Check)
   -- Nested Types
    83.78 sec (No Type Check)
    557.60 sec (Runtime Type Check)
    107.29 sec (Performance Runtime Type Check)```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kkucharc commented on a change in pull request #12527: [BEAM-10672] Added Python Combine load tests for streaming on Dataflow

2020-08-11 Thread GitBox


kkucharc commented on a change in pull request #12527:
URL: https://github.com/apache/beam/pull/12527#discussion_r468725264



##
File path: .test-infra/jenkins/job_LoadTests_Combine_Python.groovy
##
@@ -93,15 +93,23 @@ def loadTestConfigurations = { datasetName ->
 top_count: 20,
   ]
 ],
-  ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) }
+  ]
+  .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) }
+  .each{ test -> (jobType!='streaming') ?: addStreamingOptions(test) }

Review comment:
   Sure, thanks :)

##
File path: .test-infra/jenkins/job_LoadTests_Combine_Python.groovy
##
@@ -113,13 +121,31 @@ PhraseTriggeringPostCommitBuilder.postCommitJob(
 this
 ) {
   additionalPipelineArgs = [:]
-  batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR)
+  loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, "batch")
 }
 
 CronJobBuilder.cronJob('beam_LoadTests_Python_Combine_Dataflow_Batch', 'H 15 * 
* *', this) {
   additionalPipelineArgs = [
 influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName,
 influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostname,
   ]
-  batchLoadTestJob(delegate, 
CommonTestProperties.TriggeringContext.POST_COMMIT)
+  loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 
"batch")
 }
+
+PhraseTriggeringPostCommitBuilder.postCommitJob(
+'beam_LoadTests_Python_Combine_Dataflow_Streaming',
+'Run Python Load Tests Combine Dataflow Streaming',
+'Load Tests Python Combine Dataflow Streaming suite',
+this
+) {
+  additionalPipelineArgs = [:]
+  loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, 
"streaming")
+}
+
+CronJobBuilder.cronJob('beam_LoadTests_Python_Combine_Dataflow_Streaming', 'H 
15 * * *', this) {
+  additionalPipelineArgs = [
+influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName,
+influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostname,
+  ]
+  loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, 
"streaming")
+}

Review comment:
   Sure, thanks a lot :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12498: [BEAM-10654] Implemented ExternalSchemaIOTransformRegistrar for jdbc

2020-08-11 Thread GitBox


TheNeuralBit commented on a change in pull request #12498:
URL: https://github.com/apache/beam/pull/12498#discussion_r468722620



##
File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java
##
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.jdbc;
+
+import com.google.auto.service.AutoService;
+import java.io.Serializable;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.StreamSupport;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.io.SchemaIO;
+import org.apache.beam.sdk.schemas.io.SchemaIOProvider;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * An implementation of {@link SchemaIOProvider} for reading and writing JSON 
payloads with {@link
+ * JdbcIO}.
+ */
+@Internal
+@AutoService(SchemaIOProvider.class)
+public class JdbcSchemaIOProvider implements SchemaIOProvider {
+
+  /** Returns an id that uniquely represents this IO. */
+  @Override
+  public String identifier() {
+return "jdbc";
+  }
+
+  /**
+   * Returns the expected schema of the configuration object. Note this is 
distinct from the schema
+   * of the data source itself.
+   */
+  @Override
+  public Schema configurationSchema() {
+return Schema.builder()
+.addStringField("driverClassName")
+.addStringField("jdbcUrl")
+.addStringField("username")
+.addStringField("password")
+.addNullableField("connectionProperties", FieldType.STRING)
+.addNullableField("connectionInitSqls", 
FieldType.iterable(FieldType.STRING))
+.addNullableField("readQuery", FieldType.STRING)
+.addNullableField("writeStatement", FieldType.STRING)
+.addNullableField("fetchSize", FieldType.INT16)
+.addNullableField("outputParallelization", FieldType.BOOLEAN)
+.build();
+  }
+
+  /**
+   * Produce a SchemaIO given a String representing the data's location, the 
schema of the data that
+   * resides there, and some IO-specific configuration object.
+   */
+  @Override
+  public JdbcSchemaIO from(String location, Row configuration, Schema 
dataSchema) {
+return new JdbcSchemaIO(location, configuration);
+  }
+
+  @Override
+  public boolean requiresDataSchema() {
+return false;
+  }
+
+  @Override
+  public PCollection.IsBounded isBounded() {
+return PCollection.IsBounded.BOUNDED;
+  }
+
+  /** An abstraction to create schema aware IOs. */
+  static class JdbcSchemaIO implements SchemaIO, Serializable {
+protected final Row config;
+protected final String location;
+
+JdbcSchemaIO(String location, Row config) {
+  this.config = config;
+  this.location = location;
+}
+
+@Override
+public Schema schema() {
+  return null;
+}
+
+@Override
+public PTransform> buildReader() {
+  String readQuery;
+  if (config.getString("readQuery") != null) {
+readQuery = config.getString("readQuery");
+  } else {
+readQuery = String.format("SELECT f_int FROM %s", location);

Review comment:
   Similarly this shouldn't mention specific field names. In this case I 
think we could just do `SELECT * FROM %s`

##
File path: 
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java
##
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by 

[GitHub] [beam] danielxjd removed a comment on pull request #12223: [Beam-4379] Make ParquetIO read splittable

2020-08-11 Thread GitBox


danielxjd removed a comment on pull request #12223:
URL: https://github.com/apache/beam/pull/12223#issuecomment-670715872


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tysonjh commented on pull request #12499: [BEAM-10602] Fix load test metrics in Grafana dashboard

2020-08-11 Thread GitBox


tysonjh commented on pull request #12499:
URL: https://github.com/apache/beam/pull/12499#issuecomment-672113428


   > @tysonjh You should be able to run this locally with the backup data which 
is automatically retrieved from the GCS bucket when you run `docker-compose 
up`. Basically, the changes here will restore the old behavior + add a 
latency/checkpoint panel + combine the Flink ParDo results with the ones from 
Dataflow/Spark in one panel.
   
   Got it, thanks. I had to add some extra steps to the wiki for setting up the 
InfluxDB Data Source and now have graphs showing up. Please let me know when 
the comments are resolved so I can take another look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload

2020-08-11 Thread GitBox


TheNeuralBit commented on pull request #12481:
URL: https://github.com/apache/beam/pull/12481#issuecomment-672186752


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] emilymye commented on pull request #12505: [WIP][BEAM-8106] Add version to java container image name

2020-08-11 Thread GitBox


emilymye commented on pull request #12505:
URL: https://github.com/apache/beam/pull/12505#issuecomment-672204447


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb opened a new pull request #12534: [BEAM-9547] Implement some methods for deferred Series.

2020-08-11 Thread GitBox


robertwb opened a new pull request #12534:
URL: https://github.com/apache/beam/pull/12534


   Now less than 50% of the pandas doctests are skipped.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[GitHub] [beam] aaltay merged pull request #12529: Moving /tmp directory cleanup of CI workers to Inventory Jenkins job

2020-08-11 Thread GitBox


aaltay merged pull request #12529:
URL: https://github.com/apache/beam/pull/12529


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on pull request #12139: [DO NOT REVIEW] Scd py bq test

2020-08-11 Thread GitBox


Ardagan commented on pull request #12139:
URL: https://github.com/apache/beam/pull/12139#issuecomment-672253789


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz merged pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName

2020-08-11 Thread GitBox


boyuanzz merged pull request #12518:
URL: https://github.com/apache/beam/pull/12518


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12526: [BEAM-10663] Disable python kafka integration tests

2020-08-11 Thread GitBox


tvalentyn commented on pull request #12526:
URL: https://github.com/apache/beam/pull/12526#issuecomment-672258012


   Thanks everyone for reporting and fixing this error.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit opened a new pull request #12535: [BEAM-10500] Make KeyedTimerDataCoder encode output timestamp

2020-08-11 Thread GitBox


TheNeuralBit opened a new pull request #12535:
URL: https://github.com/apache/beam/pull/12535


   KeyedTimerDataCoder doesn't actually encode the outputTimestamp separately, 
instead re-using the timestamp when decoding. This leads to occasional flakes 
when the two `new Instant()` instances created in the test are different.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 
Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 

[GitHub] [beam] amaliujia commented on pull request #12515: Upgrade to ZetaSQL 2020.08.1

2020-08-11 Thread GitBox


amaliujia commented on pull request #12515:
URL: https://github.com/apache/beam/pull/12515#issuecomment-672271568


   Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #12506: [BEAM-9680] Add Filter with ParDo lesson to Go SDK Katas

2020-08-11 Thread GitBox


damondouglas commented on pull request #12506:
URL: https://github.com/apache/beam/pull/12506#issuecomment-672365280


   @lostluck the [stepik course](https://stepik.org/course/70387) has been 
updated and the `*-remote.yaml` files committed to this PR.  This PR is ready 
to merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions

2020-08-11 Thread GitBox


pabloem commented on pull request #12489:
URL: https://github.com/apache/beam/pull/12489#issuecomment-672370947


   Run Python 3.8 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robinyqiu merged pull request #12536: [BEAM-10611] Simplification: Use new ZetaSQL API to get/create Values

2020-08-11 Thread GitBox


robinyqiu merged pull request #12536:
URL: https://github.com/apache/beam/pull/12536


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robinyqiu opened a new pull request #12539: Fix some typos

2020-08-11 Thread GitBox


robinyqiu opened a new pull request #12539:
URL: https://github.com/apache/beam/pull/12539


   Fix some typos found during code import to Google codebase.
   
   r: @aaltay @kennknowles 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
 | ---
   Java | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 
con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 Status](htt
 
ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build
 
Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/)
 | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 
Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 

[GitHub] [beam] boyuanzz commented on a change in pull request #12519: [BEAM-10670] Make Read execute as a splittable DoFn by default for the Java DirectRunner.

2020-08-11 Thread GitBox


boyuanzz commented on a change in pull request #12519:
URL: https://github.com/apache/beam/pull/12519#discussion_r468874809



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java
##
@@ -211,10 +211,6 @@ public FinishBundleContext 
finishBundleContext(DoFn doFn) {
 KV> residual =
 processContext.getTakenCheckpoint();
 if (cont.shouldResume()) {
-  checkState(

Review comment:
   Is it because we have `checkDone` now?

##
File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
##
@@ -630,4 +635,38 @@ public void tearDown() {
   invoker = null;
 }
   }
+
+  /**
+   * Throws an {@link IllegalArgumentException} if the pipeline contains any 
primitive read
+   * transforms that have not been expanded to be executed as {@link DoFn 
splittable DoFns}.
+   */
+  public static void validateNoPrimitiveReads(Pipeline pipeline) {
+pipeline.traverseTopologically(new ValidateNoPrimitiveReads());
+  }
+
+  /**
+   * A {@link org.apache.beam.sdk.Pipeline.PipelineVisitor} that ensures that 
the pipeline does not
+   * contain any primitive reads.

Review comment:
   ```suggestion
  * contain any primitive reads when use_deprecated_read is not specified.
   ```

##
File path: 
runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java
##
@@ -179,6 +180,11 @@ public DirectPipelineResult run(Pipeline pipeline) {
 
   DisplayDataValidator.validatePipeline(pipeline);
   DisplayDataValidator.validateOptions(options);
+  // TODO(BEAM-10670): Remove the deprecated Read and make the splittable 
DoFn the only option.
+  if (!(ExperimentalOptions.hasExperiment(options, 
"beam_fn_api_use_deprecated_read")

Review comment:
   Is it possible to make `beam_fn_api_use_deprecated_read` and 
`use_deprecated_read` into one `use_deprecated_read` since they seem to the 
same.

##
File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java
##
@@ -167,4 +180,55 @@ public void testBoundednessForUnboundedFn() {
 "unbounded to unbounded", makeUnboundedCollection(pipeline), 
unboundedFn)
 .isBounded());
   }
+
+  private static class FakeBoundedSource extends BoundedSource {
+@Override
+public List> split(
+long desiredBundleSizeBytes, PipelineOptions options) throws Exception 
{
+  return Collections.singletonList(this);
+}
+
+@Override
+public long getEstimatedSizeBytes(PipelineOptions options) throws 
Exception {
+  return 0;
+}
+
+@Override
+public BoundedReader createReader(PipelineOptions options) throws 
IOException {
+  throw new UnsupportedOperationException();
+}
+
+@Override
+public Coder getOutputCoder() {
+  return StringUtf8Coder.of();
+}
+  }
+
+  @Test
+  public void testValidateThatThereAreNoPrimitiveReads() {

Review comment:
   Can we add one block to test using `use_deprecated_read ` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on pull request #12444: Added a whitespace lint as part of python lint precommit

2020-08-11 Thread GitBox


KevinGG commented on pull request #12444:
URL: https://github.com/apache/beam/pull/12444#issuecomment-672309994


   > Few things regarding this change
   > 
   > 1. Isn't the spotless pre-commit right place to extend with such checks? 
Many developers probably already added that to their pre-commit hooks so the 
introduction of this check will be almost unnoticeable.
   
   It will be the best if the check is unnoticeable. This check can also get 
triggered when a commit only changes some markdown or build.gradle files in the 
repo. It's not needed if the commit just changes code since different linters 
in those programming languages already covers the whitespace issues in code 
files.
   
   > 2. Seems like it's using python2.7 virtualenv [1]:
   > 
   > ```
   > 00:00:40.125 Running virtualenv with interpreter /usr/bin/python2.7
   > 00:00:40.425 DEPRECATION: Python 2.7 reached the end of its life on 
January 1st, 2020.
   > ```
   > 
   > If possible, I would suggest avoiding that due to the end of life.
   > 
   
   Changing it to use python3.8.
   
   > 1. The check is really "quiet" in terms of logging, in case of failure 
there are no details regarding the file/line and the exact issue. Check out my 
sample PR #12470 and the build output [2]. Would be helpful if it prints the 
issue similarly to the spottles e.g. [3].
   > 
   > [1] 
https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Commit/1/console
   > [2] 
https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Commit/3/console
   > [3] 
https://ci-beam.apache.org/job/beam_PreCommit_Spotless_Commit/10575/console
   
   Yes, it doesn't print much except `Running whitespacelint...` if it succeeds.
   But if it fails, it does tell you which line and what problem it is.
   The [2] actually fails because it couldn't find the precommit task: ` Task 
'whitespacePreCommit' not found in root project 'beam'.` 
   
   I'll modify the script to output more information.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12297: [BEAM-10137] Add KinesisIO for cross-language usage with python wrapper

2020-08-11 Thread GitBox


TheNeuralBit commented on a change in pull request #12297:
URL: https://github.com/apache/beam/pull/12297#discussion_r468907524



##
File path: sdks/python/apache_beam/io/kinesis.py
##
@@ -0,0 +1,317 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Kinesis streaming in Python pipelines.
+
+  These transforms are currently supported by Beam Flink and Spark portable
+  runners.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python 
SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Kinesis transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Kinesis
+  transforms. This option is only available for Beam 2.24.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) 
or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Kinesis transforms use the
+  'beam-sdks-java-io-kinesis-expansion-service' jar for this purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+initiating Kinesis transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import time
+from typing import List
+from typing import NamedTuple
+from typing import Optional
+from typing import Tuple
+
+from past.builtins import unicode
+
+from apache_beam import BeamJarExpansionService
+from apache_beam import ExternalTransform
+from apache_beam import NamedTupleBasedPayloadBuilder
+
+__all__ = [
+'WriteToKinesis',
+'ReadDataFromKinesis',
+'InitialPositionInStream',
+'WatermarkPolicy',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService(
+  'sdks:java:io:kinesis:expansion-service:shadowJar')
+
+
+WriteToKinesisSchema = NamedTuple(
+'WriteToKinesisSchema',
+[
+('stream_name', unicode),
+('aws_access_key', unicode),
+('aws_secret_key', unicode),
+('region', unicode),
+('partition_key', unicode),
+('service_endpoint', Optional[unicode]),
+('verify_certificate', Optional[bool]),
+('producer_properties', Optional[List[Tuple[unicode, unicode]]]),

Review comment:
   #12481 is merged now so this should change to mapping (as well as 
producer_properties, and the corresponding parameters in Java)

##
File path: sdks/python/apache_beam/io/kinesis.py
##
@@ -0,0 +1,317 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright 

[GitHub] [beam] TobKed commented on pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms

2020-08-11 Thread GitBox


TobKed commented on pull request #12452:
URL: https://github.com/apache/beam/pull/12452#issuecomment-672344986


   > Thanks, @TobKed , I don't have additional comments. Once all reviewers, 
whose feedback you expect, give their LGTM, please squash fixup commits, and 
leave only the commits that you want to be added to the commit history.
   
   I found I missed to use `TemporaryDirectory()` in 
`sdks/python/apache_beam/typehints/typecheck_test_py3.py`, I fixed it.
   I also added input to manual trigger so dataflow tests have to be explicitly 
enabled (made them disabled by default).
   
   Thanks for reviews @tvalentyn and all people involved in this PR. I will 
create PR to Apache Infra with request for setting up proper secrets.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12297: [BEAM-10137] Add KinesisIO for cross-language usage with python wrapper

2020-08-11 Thread GitBox


TheNeuralBit commented on a change in pull request #12297:
URL: https://github.com/apache/beam/pull/12297#discussion_r468925737



##
File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisTransformRegistrar.java
##
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kinesis;
+
+import com.amazonaws.regions.Regions;
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream;
+import com.google.auto.service.AutoService;
+import java.util.Map;
+import java.util.Properties;
+import javax.annotation.Nullable;

Review comment:
   We should prefer the checker framework Nullable annotation, 
`org.checkerframework.checker.nullness.qual.Nullable`

##
File path: sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py
##
@@ -0,0 +1,308 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Integration test for Python cross-language pipelines for Java KinesisIO.
+
+If you want to run the tests on localstack then run it just with pipeline
+options.
+
+To test it on a real AWS account you need to pass some additional params, e.g.:
+python setup.py nosetests \
+--tests=apache_beam.io.external.xlang_kinesisio_it_test \
+--test-pipeline-options="
+  --use_real_aws
+  --aws_kinesis_stream=
+  --aws_access_key=
+  --aws_secret_key=
+  --aws_region=
+  --runner=FlinkRunner"
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+import time
+import unittest
+import uuid
+
+import apache_beam as beam
+from apache_beam.io.kinesis import InitialPositionInStream
+from apache_beam.io.kinesis import ReadDataFromKinesis
+from apache_beam.io.kinesis import WriteToKinesis
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import StandardOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
+
+# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
+try:
+  import boto3
+except ImportError:
+  boto3 = None
+
+try:
+  from testcontainers.core.container import DockerContainer
+except ImportError:
+  DockerContainer = None
+# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports
+
+LOCALSTACK_VERSION = '0.11.3'
+NUM_RECORDS = 10
+NOW = time.time()
+RECORD = b'record' + str(uuid.uuid4()).encode()
+
+
+@unittest.skipUnless(DockerContainer, 'testcontainers is not installed.')
+@unittest.skipUnless(boto3, 'boto3 is not installed.')
+@unittest.skipUnless(
+TestPipeline().get_pipeline_options().view_as(StandardOptions).runner,
+'Do not run this test on precommit suites.')
+class CrossLanguageKinesisIOTest(unittest.TestCase):
+  @unittest.skipUnless(
+  TestPipeline().get_option('aws_kinesis_stream'),
+  'Cannot test on real aws without pipeline options provided')
+  def test_kinesis_io_roundtrip(self):
+# TODO: enable this test for localstack once BEAM-10664 is resolved
+self.run_kinesis_write()
+self.run_kinesis_read()
+
+  @unittest.skipIf(
+  TestPipeline().get_option('aws_kinesis_stream'),
+  'Do not test on localstack when pipeline options were provided')
+  def test_kinesis_write(self):
+# TODO: remove this test once BEAM-10664 is resolved
+self.run_kinesis_write()
+records = 

[GitHub] [beam] chamikaramj commented on a change in pull request #12263: [BEAM-10492] Add missing sideinput handling to DLP transforms

2020-08-11 Thread GitBox


chamikaramj commented on a change in pull request #12263:
URL: https://github.com/apache/beam/pull/12263#discussion_r468930529



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java
##
@@ -177,19 +177,24 @@ public DLPDeidentifyText build() {
   @Override
   public PCollection> expand(
   PCollection> input) {
-return input
-.apply(ParDo.of(new MapStringToDlpRow(getColumnDelimiter(
-.apply("Batch Contents", ParDo.of(new 
BatchRequestForDLP(getBatchSizeBytes(
-.apply(
-"DLPDeidentify",
+
+ParDo.SingleOutput>, KV>
+deidentifyParDo =
 ParDo.of(
 new DeidentifyText(
 getProjectId(),
 getInspectTemplateName(),
 getDeidentifyTemplateName(),
 getInspectConfig(),
 getDeidentifyConfig(),
-getHeaderColumns(;
+getHeaderColumns()));
+if (getHeaderColumns() != null) {

Review comment:
   It was just a suggestion. I'm OK with merging as it is if that's OK with 
the author of the PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >