[GitHub] [beam] lukecwik commented on a change in pull request #12511: Upgrade to Netty 4.1.50.Final and Netty_tcnative 2.0.31.Final
lukecwik commented on a change in pull request #12511: URL: https://github.com/apache/beam/pull/12511#discussion_r468747932 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy ## @@ -27,7 +27,7 @@ class GrpcVendoring_1_26_0 { static def protobuf_version = "3.11.0" static def grpc_version = "1.26.0" static def gson_version = "2.8.6" - static def netty_version = "4.1.42.Final" + static def netty_version = "4.1.50.Final" Review comment: We should update the version of gRPC we are vendoring instead of changing the versions we are using here. ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -581,7 +581,7 @@ class BeamModulePlugin implements Plugin { maven_exec_plugin: "maven-plugins:maven-exec-plugin:1.6.0", maven_jar_plugin : "maven-plugins:maven-jar-plugin:3.0.2", maven_shade_plugin : "maven-plugins:maven-shade-plugin:3.1.0", -maven_surefire_plugin: "maven-plugins:maven-surefire-plugin:2.21.0", +maven_surefire_plugin: "maven-plugins:maven-surefire-plugin:3.0.0-M5", Review comment: why did this need to change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468740290 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; +} + +interface IHtmlViewProps { + htmlProvider: IHtmlProvider; +} + +interface IHtmlViewState { + innerHtml: string; + script: Array; +} + +/** + * A common HTML viewing component that renders given HTML and executes scripts + * from the given provider. + */ +export class HtmlView extends React.Component { + constructor(props: IHtmlViewProps) { +super(props); +this.state = { + innerHtml: props.htmlProvider.html, + script: [] +}; + } + + componentDidMount(): void { +this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000); + } + + componentWillUnmount(): void { +clearInterval(this._updateRenderTimerId); + } + + updateRender(): void { +const currentHtml = this.state.innerHtml; +const htmlToUpdate = this.props.htmlProvider.html; +const currentScript = this.state.script; +const scriptToUpdate = [...this.props.htmlProvider.script]; +if (htmlToUpdate !== currentHtml) { + this.setState({ +innerHtml: htmlToUpdate, +// As long as the html is updated, clear the script state. +script: [] + }); +} +/* Depending on whether this iteration updates the html, the scripts + * are executed differently. + * Html updated: all scripts are new, start execution from index 0; + * Html not updated: only newly added scripts need to be executed. + */ +const currentScriptLength = + htmlToUpdate === currentHtml ? currentScript.length : 0; +if (scriptToUpdate.length > currentScriptLength) { + this.setState( +{ + script: scriptToUpdate +}, +// Executes scripts once the state is updated. +() => { + for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) { +new Function(scriptToUpdate[i])(); + } +} + ); +} + } + + render(): React.ReactNode { +return ( + // This injects raw HTML fetched from kernel into JSX. + +); + } + + private _updateRenderTimerId: number; +} + +/** + * Makes the browser support HTML import and import HTML from given hrefs if + * any is given. + * + * Native HTML import has been deprecated by modern browsers. To support + * importing reusable HTML templates, webcomponentsjs library is needed. + * The given hrefs will be imported once the library is loaded. + * + * Note everything is appended to head and if there are duplicated HTML + * imports, only the first one will take effect. + */ +export function importHtml(hrefs: Array): void { Review comment: This is needed when initializing the lumino `Widget` during activation of the extension. The `importHtml` only needs to be invoked once. The HTML we are targeting is the one used by [PAIR-facets](https://pair-code.github.io/facets/). Once invoked, later in the view, those facets visualization sent back from the kernel could be rendered correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12526: [BEAM-10663] Disable python kafka integration tests
TheNeuralBit commented on a change in pull request #12526: URL: https://github.com/apache/beam/pull/12526#discussion_r468789977 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -94,6 +94,7 @@ def run_xlang_kafkaio(self, pipeline): pipeline.run(False) +@unittest.skip('BEAM-10663') Review comment: If we add BEAM-6868 it should be in a skipIf that checks for the flink runner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit merged pull request #12530: [BEAM-7996] Add Python SqlTransform test that includes a MAP input and output
TheNeuralBit merged pull request #12530: URL: https://github.com/apache/beam/pull/12530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kkucharc commented on a change in pull request #12435: [BEAM-10616] Added Python Pardo load tests for streaming on Dataflow
kkucharc commented on a change in pull request #12435: URL: https://github.com/apache/beam/pull/12435#discussion_r468807769 ## File path: .test-infra/jenkins/job_LoadTests_ParDo_Python.groovy ## @@ -151,3 +151,35 @@ CronJobBuilder.cronJob('beam_LoadTests_Python_ParDo_Dataflow_Batch', 'H 13 * * * ] batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) } + +def streamingLoadTestJob = { scope, triggeringContext -> + scope.description('Runs Python ParDo load tests on Dataflow runner in streaming mode') + commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 120) + + def datasetName = loadTestsBuilder.getBigQueryDataset('load_test', triggeringContext) + for (testConfiguration in loadTestConfigurations("streaming", datasetName)) { +// Skipping case 2 in streaming because it timeouts. To be checked TODO: kkucharc +if(testConfiguration.title != "ParDo Python Load test: 2GB 100 byte records 200 times") { Review comment: I am totally for refactoring those LoadTestJob methods into one. Thank you for suggestion. As it comes to excluding the test, I am usually more for implicit pointing to the test that should be excluded than by pointing ex. index of array (maybe because that index can change in case of adding new test cases). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] emilymye removed a comment on pull request #12505: [WIP][BEAM-8106] Add version to java container image name
emilymye removed a comment on pull request #12505: URL: https://github.com/apache/beam/pull/12505#issuecomment-671632378 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] emilymye commented on pull request #12505: [WIP][BEAM-8106] Add version to java container image name
emilymye commented on pull request #12505: URL: https://github.com/apache/beam/pull/12505#issuecomment-672207756 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12526: [BEAM-10663] Disable python kafka integration tests
TheNeuralBit commented on pull request #12526: URL: https://github.com/apache/beam/pull/12526#issuecomment-672235923 Looks like the pending jobs are actually complete, they just haven't updated the PR. Merging now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #12516: [BEAM-9547] Implement dataframes top, join, merge.
robertwb commented on a change in pull request #12516: URL: https://github.com/apache/beam/pull/12516#discussion_r468862256 ## File path: sdks/python/apache_beam/dataframe/frames.py ## @@ -54,6 +54,42 @@ def agg(self, *args, **kwargs): 'order-sensitive') diff = frame_base.wont_implement_method('order-sensitive') + @frame_base.args_to_kwargs(pd.Series) + def nlargest(self, **kwargs): +if 'keep' in kwargs and kwargs['keep'] != 'all': + raise frame_base.WontImplementError('order-sensitive') +per_partition = expressions.ComputedExpression( +'nlargest', Review comment: Good point, updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TobKed commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
TobKed commented on a change in pull request #12452: URL: https://github.com/apache/beam/pull/12452#discussion_r468877557 ## File path: CI.md ## @@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with regards of the jobs Those jobs often have matrix run strategy which runs several different variations of the jobs (with different platform type / Python version to run for example) +### Google Cloud Platform Credentials + +Some of the jobs require variables stored as a [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn edited a comment on pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
tvalentyn edited a comment on pull request #12452: URL: https://github.com/apache/beam/pull/12452#issuecomment-672296133 Thanks, @TobKed , I don't have additional comments. Once all reviewers, whose feedback you expect, give their LGTM, please squash fixup commits, and leave only the commits that you want to be added to the commit history. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
tvalentyn commented on pull request #12452: URL: https://github.com/apache/beam/pull/12452#issuecomment-672296133 Thanks, @TobKed , I don't have additional comments. Once all reviewers, whose feedback you expect, gave their LGTM, please squash fixup commits, and leave only the commits that you want to be added to the commit history. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] sclukas77 commented on a change in pull request #12498: [BEAM-10654] Implemented ExternalSchemaIOTransformRegistrar for jdbc
sclukas77 commented on a change in pull request #12498: URL: https://github.com/apache/beam/pull/12498#discussion_r468720134 ## File path: sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java ## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.schemaio.expansion; + +import com.google.auto.service.AutoService; +import java.io.ByteArrayInputStream; +import java.io.InputStream; +import java.util.Map; +import java.util.ServiceLoader; +import javax.annotation.Nullable; +import org.apache.beam.model.pipeline.v1.SchemaApi; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.RowCoder; +import org.apache.beam.sdk.expansion.ExternalTransformRegistrar; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.SchemaTranslation; +import org.apache.beam.sdk.schemas.io.SchemaIOProvider; +import org.apache.beam.sdk.transforms.ExternalTransformBuilder; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap; + +@Experimental(Experimental.Kind.PORTABILITY) +@AutoService(ExternalTransformRegistrar.class) +public class ExternalSchemaIOTransformRegistrar implements ExternalTransformRegistrar { + + @Override + public Map> knownBuilderInstances() { +ImmutableMap.Builder builder = ImmutableMap.builder(); +try { + for (SchemaIOProvider schemaIOProvider : ServiceLoader.load(SchemaIOProvider.class)) { +builder.put( +"beam:external:java:" + schemaIOProvider.identifier() + ":read:v1", +new ReaderBuilder(schemaIOProvider)); +builder.put( +"beam:external:java:" + schemaIOProvider.identifier() + ":write:v1", +new WriterBuilder(schemaIOProvider)); + } +} catch (Exception e) { + throw new RuntimeException(e.getMessage()); +} +return builder.build(); + } + + public static class Configuration { +String location = ""; +byte[] config = new byte[0]; +@Nullable byte[] dataSchema = null; + +public void setLocation(String location) { + this.location = location; +} + +public void setConfig(byte[] config) { + this.config = config; +} + +public void setDataSchema(byte[] dataSchema) { + this.dataSchema = dataSchema; +} + } + + @Nullable + private static Schema translateSchema(@Nullable byte[] schemaBytes) throws Exception { +if (schemaBytes == null) { + return null; +} +SchemaApi.Schema protoSchema = SchemaApi.Schema.parseFrom(schemaBytes); +return SchemaTranslation.schemaFromProto(protoSchema); + } + + private static Row translateRow(byte[] rowBytes, Schema configSchema) throws Exception { +RowCoder rowCoder = RowCoder.of(configSchema); +InputStream stream = new ByteArrayInputStream(rowBytes); +return rowCoder.decode(stream); + } + + private static class ReaderBuilder + implements ExternalTransformBuilder> { +SchemaIOProvider schemaIOProvider; + +ReaderBuilder(SchemaIOProvider schemaIOProvider) { + this.schemaIOProvider = schemaIOProvider; +} + +@Override +public PTransform> buildExternal(Configuration configuration) { + try { +return schemaIOProvider +.from( +configuration.location, +translateRow(configuration.config, schemaIOProvider.configurationSchema()), +translateSchema(configuration.dataSchema)) +.buildReader(); + } catch (Exception e) { +throw new RuntimeException("Could not convert configuration proto to row or schema."); + } +} + } + + private static class WriterBuilder Review comment: ACK, I will add these in. This is an automated message from the Apache Git Service. To respond to the
[GitHub] [beam] mxm commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload
mxm commented on pull request #12481: URL: https://github.com/apache/beam/pull/12481#issuecomment-672083732 Run XVR_Direct PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] danielxjd commented on pull request #12223: [Beam-4379] Make ParquetIO read splittable
danielxjd commented on pull request #12223: URL: https://github.com/apache/beam/pull/12223#issuecomment-672092490 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload
TheNeuralBit commented on pull request #12481: URL: https://github.com/apache/beam/pull/12481#issuecomment-672106456 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay merged pull request #12528: Extending archiveJunit Jenkins post-commit task with stability history
aaltay merged pull request #12528: URL: https://github.com/apache/beam/pull/12528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz edited a comment on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
boyuanzz edited a comment on pull request #12518: URL: https://github.com/apache/beam/pull/12518#issuecomment-672158339 > Will it be possible to re-enable the Kafka test on direct runner? I don't think we have Kafka test on direct runner. Here is the only place I can find to run this test: https://github.com/apache/beam/blob/12dc66995056722b8ea7b165ef872dcd81b1f54a/sdks/python/test-suites/portable/common.gradle#L197-L216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468778647 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; Review comment: Acked. Change all occurrences to `string[]`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #12521: [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest
kennknowles commented on a change in pull request #12521: URL: https://github.com/apache/beam/pull/12521#discussion_r468783212 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java ## @@ -290,5 +329,31 @@ public void coderConsistentWithEquals() throws Exception { } } } + +@Test +public void verifyDeterministic() throws Exception { + if (expectDeterministic) { +assertDeterministic(coder); + } else { +assertNonDeterministic(coder); + } +} + } + + private static void assertDeterministic(SchemaCoder coder) { Review comment: There's also https://github.com/apache/beam/blob/0098eb6f9900849b05bfc2ec2497ed8c7d37148a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/CoderProperties.java#L67 (coder properties are "properties" in the sense of property-driven testing a la QuickCheck) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12522: Fix format string in PipelineValidator
TheNeuralBit commented on pull request #12522: URL: https://github.com/apache/beam/pull/12522#issuecomment-672195554 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #12516: [BEAM-9547] Implement dataframes top, join, merge.
robertwb commented on pull request #12516: URL: https://github.com/apache/beam/pull/12516#issuecomment-672277371 > This is going to take me a while to review since I have very little Python experience and even less Panda experience. If you want a more thorough, faster review, I suggest adding someone else. Otherwise I'll make my way through the PR this week. This week would be fine, if you're up to it. I'm trying to spread the load (and knowledge) a bit around the team. Feel free to ask clarifying questions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468764276 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; +} + +interface IHtmlViewProps { + htmlProvider: IHtmlProvider; +} + +interface IHtmlViewState { + innerHtml: string; + script: Array; +} + +/** + * A common HTML viewing component that renders given HTML and executes scripts + * from the given provider. + */ +export class HtmlView extends React.Component { + constructor(props: IHtmlViewProps) { +super(props); +this.state = { + innerHtml: props.htmlProvider.html, + script: [] +}; + } + + componentDidMount(): void { +this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000); + } + + componentWillUnmount(): void { +clearInterval(this._updateRenderTimerId); + } + + updateRender(): void { +const currentHtml = this.state.innerHtml; +const htmlToUpdate = this.props.htmlProvider.html; +const currentScript = this.state.script; +const scriptToUpdate = [...this.props.htmlProvider.script]; +if (htmlToUpdate !== currentHtml) { + this.setState({ +innerHtml: htmlToUpdate, +// As long as the html is updated, clear the script state. +script: [] + }); +} +/* Depending on whether this iteration updates the html, the scripts + * are executed differently. + * Html updated: all scripts are new, start execution from index 0; + * Html not updated: only newly added scripts need to be executed. + */ +const currentScriptLength = + htmlToUpdate === currentHtml ? currentScript.length : 0; +if (scriptToUpdate.length > currentScriptLength) { + this.setState( +{ + script: scriptToUpdate +}, +// Executes scripts once the state is updated. +() => { + for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) { +new Function(scriptToUpdate[i])(); + } +} + ); +} + } + + render(): React.ReactNode { +return ( + // This injects raw HTML fetched from kernel into JSX. + Review comment: Yeah, it sounds scary :) [dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml) is React's replacement for `innerHTML` in the browser DOM. And the warning about using `innerHTML` is [here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar [concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited). All these concerns are about serving HTML from another domain that might contain malicious code. Here I think it's a valid use case because the HTML and scripts are returned from the kernel generated by the SDK as the user executes codes in their notebooks. They are known and trusted. The purpose of this module is to render the generated visualization. Whether using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has to be rendered and the scripts have to be executed. In the end, either way would do the same thing. The security concern is similar to whether a notebook should execute scripts in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb file (stored as JSON text) and does not execute any script unless the notebook is trusted. A notebook is only trusted after the user executes `jupyter trust notebook.ipynb` or after the user executes the notebook in a kernel. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damgad commented on pull request #12529: Moving /tmp directory cleanup of CI workers to Inventory Jenkins job
damgad commented on pull request #12529: URL: https://github.com/apache/beam/pull/12529#issuecomment-672145185 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
TheNeuralBit commented on pull request #12518: URL: https://github.com/apache/beam/pull/12518#issuecomment-672156135 Will it be possible to re-enable the Kafka test on direct runner? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
boyuanzz commented on pull request #12518: URL: https://github.com/apache/beam/pull/12518#issuecomment-672201149 > Is bundle finalization not an issue for DirectRunner ? If so we should be able to run the Kafka test on DirectRunner. Within https://github.com/apache/beam/pull/12488, it should work. But I'm not sure whether DirectRunner expands the SDF as expected. cc: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test
ihji commented on pull request #12533: URL: https://github.com/apache/beam/pull/12533#issuecomment-672209854 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12521: [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest
TheNeuralBit commented on a change in pull request #12521: URL: https://github.com/apache/beam/pull/12521#discussion_r468822570 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/SchemaCoderTest.java ## @@ -290,5 +329,31 @@ public void coderConsistentWithEquals() throws Exception { } } } + +@Test +public void verifyDeterministic() throws Exception { + if (expectDeterministic) { +assertDeterministic(coder); + } else { +assertNonDeterministic(coder); + } +} + } + + private static void assertDeterministic(SchemaCoder coder) { Review comment: Thanks! Added a commit to use verifyDeterministic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
chamikaramj commented on a change in pull request #12485: URL: https://github.com/apache/beam/pull/12485#discussion_r468830116 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -304,6 +308,8 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" +"""The number of shards per destination when writing via streaming inserts.""" +DEFAULT_SHARDS_PER_DESTINATION = 500 Review comment: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L58 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
chamikaramj commented on a change in pull request #12485: URL: https://github.com/apache/beam/pull/12485#discussion_r468829856 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -304,6 +308,8 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" +"""The number of shards per destination when writing via streaming inserts.""" +DEFAULT_SHARDS_PER_DESTINATION = 500 Review comment: Note that Java also allows users control this through a pipeline option and Dataflow support ask users to update this parameter as they fit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
chamikaramj commented on a change in pull request #12485: URL: https://github.com/apache/beam/pull/12485#discussion_r468830450 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -304,6 +308,8 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" +"""The number of shards per destination when writing via streaming inserts.""" +DEFAULT_SHARDS_PER_DESTINATION = 500 Review comment: These flags for streaming inserts can be added in a separate PR though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions
pabloem commented on pull request #12489: URL: https://github.com/apache/beam/pull/12489#issuecomment-672280905 Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu opened a new pull request #12536: [BEAM-10611] Simplification: Use new ZetaSQL API to get/create Values
robinyqiu opened a new pull request #12536: URL: https://github.com/apache/beam/pull/12536 Simplify getting/creating ZetaSQL values using the new API. r: @apilloud @ZijieSong946 Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](htt ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_ Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
[GitHub] [beam] KevinGG commented on a change in pull request #12444: Added a whitespace lint as part of python lint precommit
KevinGG commented on a change in pull request #12444: URL: https://github.com/apache/beam/pull/12444#discussion_r468881675 ## File path: sdks/python/scripts/run_whitespacelint.sh ## @@ -0,0 +1,32 @@ +#!/bin/bash +# +#Licensed to the Apache Software Foundation (ASF) under one or more +#contributor license agreements. See the NOTICE file distributed with +#this work for additional information regarding copyright ownership. +#The ASF licenses this file to You under the Apache License, Version 2.0 +#(the "License"); you may not use this file except in compliance with +#the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +# + +set -o errexit + +pushd ../../ + +# Check for trailing spaces in non-code text files. +# Currently include markdown files and gradle build files. +echo "Running whitespacelint..." +FILES=$(find . \( -name '*.md' -o -name 'build.gradle' \)) +for FILE in $FILES +do + whitespacelint $FILE Review comment: Thanks, changing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on a change in pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload
mxm commented on a change in pull request #12481: URL: https://github.com/apache/beam/pull/12481#discussion_r467556781 ## File path: sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java ## @@ -1452,12 +1446,12 @@ public void populateDisplayData(DisplayData.Builder builder) { public static class Configuration { // All byte arrays are UTF-8 encoded strings -private Iterable> producerConfig; +private Map producerConfig; private String topic; private String keySerializer; private String valueSerializer; Review comment: Comment should be updated (no byte arrays anymore). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload
mxm commented on pull request #12481: URL: https://github.com/apache/beam/pull/12481#issuecomment-672083483 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on a change in pull request #12526: [BEAM-10663] Disable python kafka integration tests
boyuanzz commented on a change in pull request #12526: URL: https://github.com/apache/beam/pull/12526#discussion_r468724341 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -94,6 +94,7 @@ def run_xlang_kafkaio(self, pipeline): pipeline.run(False) +@unittest.skip('BEAM-10663') Review comment: ```suggestion @unittest.skip('BEAM-10663, BEAM-6868') ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tysonjh commented on a change in pull request #12435: [BEAM-10616] Added Python Pardo load tests for streaming on Dataflow
tysonjh commented on a change in pull request #12435: URL: https://github.com/apache/beam/pull/12435#discussion_r468749356 ## File path: sdks/python/apache_beam/testing/load_tests/pardo_test.py ## @@ -125,7 +125,9 @@ def process(self, element, state=state_param): state.add(1) yield element -if self.get_option_or_default('streaming', False): +if self.get_option_or_default( +'streaming', +False) and self.pipeline.get_option('runner') == "PortableRunner": Review comment: Right, I remember you mentioned there was some issue with the SyntheticSource that Max ran into. Ideally Synthetic source would work for all standard uses but I'm not familiar with the details. How big of a change would it be to fix SyntheticSource? I'm OK with moving forward with this change, creating a Jira issue for tracking the SyntheticSource improvement, and noting it here as a TODO. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #12519: [BEAM-10670] Make Read execute as a splittable DoFn by default for the Java DirectRunner.
lukecwik commented on pull request #12519: URL: https://github.com/apache/beam/pull/12519#issuecomment-672146059 Run JavaPortabilityApiJava11 PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji opened a new pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test
ihji opened a new pull request #12533: URL: https://github.com/apache/beam/pull/12533 Using state instead of early triggering and combine per key. This would reduce the flakiness in the streaming test. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](htt ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] ihji commented on pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test
ihji commented on pull request #12533: URL: https://github.com/apache/beam/pull/12533#issuecomment-672209441 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit merged pull request #12526: [BEAM-10663] Disable python kafka integration tests
TheNeuralBit merged pull request #12526: URL: https://github.com/apache/beam/pull/12526 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #12139: [DO NOT REVIEW] Scd py bq test
Ardagan commented on pull request #12139: URL: https://github.com/apache/beam/pull/12139#issuecomment-672240516 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
pabloem commented on a change in pull request #12485: URL: https://github.com/apache/beam/pull/12485#discussion_r468842890 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -304,6 +308,8 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" +"""The number of shards per destination when writing via streaming inserts.""" +DEFAULT_SHARDS_PER_DESTINATION = 500 Review comment: Filed https://issues.apache.org/jira/browse/BEAM-10680 - thanks Cham! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12492: [BEAM-6807] Implement an Azure blobstore filesystem for Python SDK
pabloem commented on pull request #12492: URL: https://github.com/apache/beam/pull/12492#issuecomment-672273518 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468764276 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; +} + +interface IHtmlViewProps { + htmlProvider: IHtmlProvider; +} + +interface IHtmlViewState { + innerHtml: string; + script: Array; +} + +/** + * A common HTML viewing component that renders given HTML and executes scripts + * from the given provider. + */ +export class HtmlView extends React.Component { + constructor(props: IHtmlViewProps) { +super(props); +this.state = { + innerHtml: props.htmlProvider.html, + script: [] +}; + } + + componentDidMount(): void { +this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000); + } + + componentWillUnmount(): void { +clearInterval(this._updateRenderTimerId); + } + + updateRender(): void { +const currentHtml = this.state.innerHtml; +const htmlToUpdate = this.props.htmlProvider.html; +const currentScript = this.state.script; +const scriptToUpdate = [...this.props.htmlProvider.script]; +if (htmlToUpdate !== currentHtml) { + this.setState({ +innerHtml: htmlToUpdate, +// As long as the html is updated, clear the script state. +script: [] + }); +} +/* Depending on whether this iteration updates the html, the scripts + * are executed differently. + * Html updated: all scripts are new, start execution from index 0; + * Html not updated: only newly added scripts need to be executed. + */ +const currentScriptLength = + htmlToUpdate === currentHtml ? currentScript.length : 0; +if (scriptToUpdate.length > currentScriptLength) { + this.setState( +{ + script: scriptToUpdate +}, +// Executes scripts once the state is updated. +() => { + for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) { +new Function(scriptToUpdate[i])(); + } +} + ); +} + } + + render(): React.ReactNode { +return ( + // This injects raw HTML fetched from kernel into JSX. + Review comment: Yeah, it sounds scary :) [dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml) is React's replacement for `innerHTML` in the browser DOM. And the warning about using `innerHTML` is [here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar [concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited). All these concerns are about serving HTML from another domain that might contain malicious code. Here I think it's a valid use case because the HTML and scripts are returned from the kernel generated by the SDK as the user executes codes in their notebooks. They are known and trusted. The purpose for this model is to render the generated visualization. Whether using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has to be rendered and the scripts have to be executed. In the end, either way would do the same thing. The security concern is similar to whether a notebook should execute scripts in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb file (stored as JSON text) and does not execute any script unless the notebook is trusted. A notebook is only trusted after the user executes `jupyter trust notebook.ipynb` or after the user executes the notebook in a kernel. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #12528: Extending archiveJunit Jenkins post-commit task with stability history
aaltay commented on pull request #12528: URL: https://github.com/apache/beam/pull/12528#issuecomment-672142551 /cc @tysonjh This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
TheNeuralBit commented on pull request #12518: URL: https://github.com/apache/beam/pull/12518#issuecomment-672181034 Ah ok. Maybe we should just change that suite to use the DirectRunner? I'm not sure why it's using Flink CC: @chamikaramj, @ihji in case they know This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
chamikaramj commented on pull request #12518: URL: https://github.com/apache/beam/pull/12518#issuecomment-672191125 Is bundle finalization not an issue for DirectRunner ? If so we should be able to run the Kafka test on DirectRunner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions
pabloem commented on a change in pull request #12489: URL: https://github.com/apache/beam/pull/12489#discussion_r468825940 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -304,6 +308,8 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" +"""The number of shards per destination when writing via streaming inserts.""" +DEFAULT_SHARDS_PER_DESTINATION = 500 Review comment: yeah, this needs to fix conflicts with them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions
pabloem commented on a change in pull request #12489: URL: https://github.com/apache/beam/pull/12489#discussion_r468843122 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1048,6 +1055,11 @@ def __init__( to be passed when creating a BigQuery table. These are passed when triggering a load job for FILE_LOADS, and when creating a new table for STREAMING_INSERTS. + with_insert_ids: When using the STREAMING_INSERTS method to write data to Review comment: I've renamed the option to match the name in Java SDK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tysonjh commented on a change in pull request #12516: [BEAM-9547] Implement dataframes top, join, merge.
tysonjh commented on a change in pull request #12516: URL: https://github.com/apache/beam/pull/12516#discussion_r468707336 ## File path: sdks/python/apache_beam/dataframe/frames.py ## @@ -54,6 +54,42 @@ def agg(self, *args, **kwargs): 'order-sensitive') diff = frame_base.wont_implement_method('order-sensitive') + @frame_base.args_to_kwargs(pd.Series) + def nlargest(self, **kwargs): +if 'keep' in kwargs and kwargs['keep'] != 'all': + raise frame_base.WontImplementError('order-sensitive') +per_partition = expressions.ComputedExpression( +'nlargest', Review comment: Would using 'nlargest' for the name of both the per-partition and global computed expression cause confusion for debugging or UX? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468740290 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; +} + +interface IHtmlViewProps { + htmlProvider: IHtmlProvider; +} + +interface IHtmlViewState { + innerHtml: string; + script: Array; +} + +/** + * A common HTML viewing component that renders given HTML and executes scripts + * from the given provider. + */ +export class HtmlView extends React.Component { + constructor(props: IHtmlViewProps) { +super(props); +this.state = { + innerHtml: props.htmlProvider.html, + script: [] +}; + } + + componentDidMount(): void { +this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000); + } + + componentWillUnmount(): void { +clearInterval(this._updateRenderTimerId); + } + + updateRender(): void { +const currentHtml = this.state.innerHtml; +const htmlToUpdate = this.props.htmlProvider.html; +const currentScript = this.state.script; +const scriptToUpdate = [...this.props.htmlProvider.script]; +if (htmlToUpdate !== currentHtml) { + this.setState({ +innerHtml: htmlToUpdate, +// As long as the html is updated, clear the script state. +script: [] + }); +} +/* Depending on whether this iteration updates the html, the scripts + * are executed differently. + * Html updated: all scripts are new, start execution from index 0; + * Html not updated: only newly added scripts need to be executed. + */ +const currentScriptLength = + htmlToUpdate === currentHtml ? currentScript.length : 0; +if (scriptToUpdate.length > currentScriptLength) { + this.setState( +{ + script: scriptToUpdate +}, +// Executes scripts once the state is updated. +() => { + for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) { +new Function(scriptToUpdate[i])(); + } +} + ); +} + } + + render(): React.ReactNode { +return ( + // This injects raw HTML fetched from kernel into JSX. + +); + } + + private _updateRenderTimerId: number; +} + +/** + * Makes the browser support HTML import and import HTML from given hrefs if + * any is given. + * + * Native HTML import has been deprecated by modern browsers. To support + * importing reusable HTML templates, webcomponentsjs library is needed. + * The given hrefs will be imported once the library is loaded. + * + * Note everything is appended to head and if there are duplicated HTML + * imports, only the first one will take effect. + */ +export function importHtml(hrefs: Array): void { Review comment: This is needed when initializing the `Widget` during activation of the extension. The `importHtml` only needs to be invoked once. The HTML we are targeting is the one used by [PAIR-facets](https://pair-code.github.io/facets/). Once invoked, later in the view, those facets visualization sent back from the kernel could be rendered correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468764276 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; +} + +interface IHtmlViewProps { + htmlProvider: IHtmlProvider; +} + +interface IHtmlViewState { + innerHtml: string; + script: Array; +} + +/** + * A common HTML viewing component that renders given HTML and executes scripts + * from the given provider. + */ +export class HtmlView extends React.Component { + constructor(props: IHtmlViewProps) { +super(props); +this.state = { + innerHtml: props.htmlProvider.html, + script: [] +}; + } + + componentDidMount(): void { +this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000); + } + + componentWillUnmount(): void { +clearInterval(this._updateRenderTimerId); + } + + updateRender(): void { +const currentHtml = this.state.innerHtml; +const htmlToUpdate = this.props.htmlProvider.html; +const currentScript = this.state.script; +const scriptToUpdate = [...this.props.htmlProvider.script]; +if (htmlToUpdate !== currentHtml) { + this.setState({ +innerHtml: htmlToUpdate, +// As long as the html is updated, clear the script state. +script: [] + }); +} +/* Depending on whether this iteration updates the html, the scripts + * are executed differently. + * Html updated: all scripts are new, start execution from index 0; + * Html not updated: only newly added scripts need to be executed. + */ +const currentScriptLength = + htmlToUpdate === currentHtml ? currentScript.length : 0; +if (scriptToUpdate.length > currentScriptLength) { + this.setState( +{ + script: scriptToUpdate +}, +// Executes scripts once the state is updated. +() => { + for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) { +new Function(scriptToUpdate[i])(); + } +} + ); +} + } + + render(): React.ReactNode { +return ( + // This injects raw HTML fetched from kernel into JSX. + Review comment: Yeah, it sounds scary :) [dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml) is React's replacement for `innerHTML` in the browser DOM. And the warning about using `innerHTML` is [here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar [concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited). All these concerns are about serving HTML from another domain that might contain malicious code. Here I think it's a valid use case because the HTML and scripts are returned from the kernel generated by the SDK as the user executes codes in their notebooks. They are known and trusted. The purpose of this module is to render the generated visualization. Whether using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has to be rendered and the scripts have to be executed. In the end, either way would do the same thing. The security concern is similar to whether a notebook should execute scripts in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb file (stored as JSON text, identical to this module getting those JSON fields through kernel messaging) and does not execute any script unless the notebook is trusted. A notebook is only trusted after the user executes `jupyter trust notebook.ipynb` or after the user executes the notebook in a kernel (identical to the extension's usage). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact
[GitHub] [beam] KevinGG commented on a change in pull request #12460: [BEAM-10545] HtmlView module
KevinGG commented on a change in pull request #12460: URL: https://github.com/apache/beam/pull/12460#discussion_r468764276 ## File path: sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/src/common/HtmlView.tsx ## @@ -0,0 +1,119 @@ +// Licensed under the Apache License, Version 2.0 (the 'License'); you may not +// use this file except in compliance with the License. You may obtain a copy of +// the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT +// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +// License for the specific language governing permissions and limitations under +// the License. + +import * as React from 'react'; + +export interface IHtmlProvider { + readonly html: string; + readonly script: Array; +} + +interface IHtmlViewProps { + htmlProvider: IHtmlProvider; +} + +interface IHtmlViewState { + innerHtml: string; + script: Array; +} + +/** + * A common HTML viewing component that renders given HTML and executes scripts + * from the given provider. + */ +export class HtmlView extends React.Component { + constructor(props: IHtmlViewProps) { +super(props); +this.state = { + innerHtml: props.htmlProvider.html, + script: [] +}; + } + + componentDidMount(): void { +this._updateRenderTimerId = setInterval(() => this.updateRender(), 1000); + } + + componentWillUnmount(): void { +clearInterval(this._updateRenderTimerId); + } + + updateRender(): void { +const currentHtml = this.state.innerHtml; +const htmlToUpdate = this.props.htmlProvider.html; +const currentScript = this.state.script; +const scriptToUpdate = [...this.props.htmlProvider.script]; +if (htmlToUpdate !== currentHtml) { + this.setState({ +innerHtml: htmlToUpdate, +// As long as the html is updated, clear the script state. +script: [] + }); +} +/* Depending on whether this iteration updates the html, the scripts + * are executed differently. + * Html updated: all scripts are new, start execution from index 0; + * Html not updated: only newly added scripts need to be executed. + */ +const currentScriptLength = + htmlToUpdate === currentHtml ? currentScript.length : 0; +if (scriptToUpdate.length > currentScriptLength) { + this.setState( +{ + script: scriptToUpdate +}, +// Executes scripts once the state is updated. +() => { + for (let i = currentScriptLength; i < scriptToUpdate.length; ++i) { +new Function(scriptToUpdate[i])(); + } +} + ); +} + } + + render(): React.ReactNode { +return ( + // This injects raw HTML fetched from kernel into JSX. + Review comment: Yeah, it sounds scary :) [dangerouslySetInnerHTML](https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml) is React's replacement for `innerHTML` in the browser DOM. And the warning about using `innerHTML` is [here](https://screenshot.googleplex.com/nJEFJyqWJiG). `iframe` has similar [concerns](https://stackoverflow.com/questions/7289139/why-are-iframes-considered-dangerous-and-a-security-risk#:~:text=If%20you%20control%20the%20content,%2C%20they're%20perfectly%20safe.=The%20IFRAME%20element%20may%20be,an%20IFRAME%20on%20hostile%20site.=In%20addition%2C%20IFRAME%20element%20may,vulnerability%20which%20can%20be%20exploited). All these concerns are about serving HTML from another domain that might contain malicious code. Here I think it's a valid use case because the HTML and scripts are returned from the kernel generated by the SDK as the user executes codes in their notebooks. They are known and trusted. The purpose of this module is to render the generated visualization. Whether using `dangerouslySetInnerHTML` or `iframe`, to serve the purpose, the HTML has to be rendered and the scripts have to be executed. In the end, either way would do the same thing. The security concern is similar to whether a notebook should execute scripts in an opened ipynb file. By default, Jupyter only renders HTML in the ipynb file (stored as JSON text, identical to this module getting those JSON fields through kernel messaging) and does not execute any script unless the notebook is trusted. A notebook is only trusted after the user executes `jupyter trust notebook.ipynb` or after the user executes the notebook in a kernel. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles merged pull request #12366: [BEAM-10572] Eliminate nullability errors from :sdks:java:extensions:sql:datacatalog
kennknowles merged pull request #12366: URL: https://github.com/apache/beam/pull/12366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
pabloem commented on a change in pull request #12485: URL: https://github.com/apache/beam/pull/12485#discussion_r468825099 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1419,7 +1448,18 @@ def __init__( Default is to retry always. This means that whenever there are rows that fail to be inserted to BigQuery, they will be retried indefinitely. Other retry strategy settings will produce a deadletter PCollection -as output. +as output. Appropriate values are: + +* `RetryStrategy.RETRY_ALWAYS`: retry all rows if + there are any kind of errors. Note that this will hold your pipeline + back if there are errors until you cancel or update it. Review comment: that's correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
pabloem commented on a change in pull request #12485: URL: https://github.com/apache/beam/pull/12485#discussion_r468825677 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -304,6 +308,8 @@ def compute_table_name(row): NOTE: This job name template does not have backwards compatibility guarantees. """ BQ_JOB_NAME_TEMPLATE = "beam_bq_job_{job_type}_{job_id}_{step_id}{random}" +"""The number of shards per destination when writing via streaming inserts.""" +DEFAULT_SHARDS_PER_DESTINATION = 500 Review comment: we've been able to reach ~1k EPS per worker in Python. If we have 50 shards, we'll only reach ~50k maximum. I'd like to have a larger default, so we don't automatically cap EPS at a very low rate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload
TheNeuralBit commented on pull request #12481: URL: https://github.com/apache/beam/pull/12481#issuecomment-672256723 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12522: Fix format string in PipelineValidator
TheNeuralBit commented on pull request #12522: URL: https://github.com/apache/beam/pull/12522#issuecomment-672263017 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem merged pull request #12485: [BEAM-6064] Improvements to BQ streaming insert performance
pabloem merged pull request #12485: URL: https://github.com/apache/beam/pull/12485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit merged pull request #12521: [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest
TheNeuralBit merged pull request #12521: URL: https://github.com/apache/beam/pull/12521 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji removed a comment on pull request #12533: [BEAM-10679] improving XLang KafkaIO streaming test
ihji removed a comment on pull request #12533: URL: https://github.com/apache/beam/pull/12533#issuecomment-672209441 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TobKed commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
TobKed commented on a change in pull request #12452: URL: https://github.com/apache/beam/pull/12452#discussion_r468876965 ## File path: CI.md ## @@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with regards of the jobs Those jobs often have matrix run strategy which runs several different variations of the jobs (with different platform type / Python version to run for example) +### Google Cloud Platform Credentials + +Some of the jobs require variables stored as a [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) +to perform operations on Google Cloud Platform. Currently these jobs are limited to Apache repository only. +These variables are: + * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository has it set to `apache-beam-testing`. + * `GCP_REGION` - Region of the bucket and dataflow jobs. Apache/Beam repository has it set to `us-central1`. + * `GCP_TESTING_BUCKET` - Name of the bucket where temporary files for Dataflow tests will be stored. Apache/Beam repository has it set to `beam-github-actions-tests`. + * `GCP_SA_EMAIL` - Service account email address. This is usually of the format `@.iam.gserviceaccount.com`. + * `GCP_SA_KEY` - Service account key. This key should be created and encoded as a Base64 string (eg. `cat my-key.json | base64` on macOS). + +Service Account shall have following permissions: + * Storage Admin (roles/storage.admin) + * Dataflow Admin (roles/dataflow.admin) + +### Workflows + + Build python source distribution and wheels - [build_wheels.yml](.github/workflows/build_wheels.yml) + | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |-||--|---|---|--| +| Check are GCP variables set | Checks are GCP variables are set. Jobs which required them depends on the output of this job. | Yes | Yes | Yes | Yes/No | Review comment: I used `Check GCP variables` and also changed it in workflow itself. Looks much more cleaner and simpler. ## File path: CI.md ## @@ -85,16 +105,15 @@ Those jobs often have matrix run strategy which runs several different variation | List files on Google Cloud Storage Bucket | Lists files on GCS for verification purpose. | - | Yes | Yes | Yes | | Tag repo nightly| Tag repo with `nightly-master` tag if build python source distribution and python wheels finished successfully. | -| - | Yes | - | -### Google Cloud Platform Credentials - -Some of the jobs require variables stored as a [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) -to perform operations on Google Cloud Platform. Currently these jobs are limited to Apache repository only. -These variables are: - * `GCP_SA_EMAIL` - Service account email address. This is usually of the format `@.iam.gserviceaccount.com`. - * `GCP_SA_KEY` - Service account key. This key should be created and encoded as a Base64 string (eg. `cat my-key.json | base64` on macOS). + Python tests - [python_tests.yml](.github/workflows/python_tests.yml) -Service Account shall have following permissions: - * Storage Object Admin (roles/storage.objectAdmin) +| Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run |
[GitHub] [beam] TobKed commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
TobKed commented on a change in pull request #12452: URL: https://github.com/apache/beam/pull/12452#discussion_r468877500 ## File path: CI.md ## @@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with regards of the jobs Those jobs often have matrix run strategy which runs several different variations of the jobs (with different platform type / Python version to run for example) +### Google Cloud Platform Credentials + +Some of the jobs require variables stored as a [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) +to perform operations on Google Cloud Platform. Currently these jobs are limited to Apache repository only. +These variables are: + * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository has it set to `apache-beam-testing`. Review comment: I agree. I changed it referring values as examples. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12492: [BEAM-6807] Implement an Azure blobstore filesystem for Python SDK
pabloem commented on a change in pull request #12492: URL: https://github.com/apache/beam/pull/12492#discussion_r468877314 ## File path: sdks/python/apache_beam/io/azure/blobstoragefilesystem_test.py ## @@ -0,0 +1,315 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unit tests for Azure Blob Storage File System.""" + +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import unittest + +# patches unittest.TestCase to be python3 compatible. +import future.tests.base # pylint: disable=unused-import +import mock + +from apache_beam.io.azure import blobstorageio Review comment: the import error occurs here, so you should move this import to line 40 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm commented on pull request #12385: [BEAM-10527] Migrate Flink and Spark tests to pytest.
mxm commented on pull request #12385: URL: https://github.com/apache/beam/pull/12385#issuecomment-672090815 > > > I have already spent a long time trying to fix quotes, so I can't help but wondering: why do we need flinkCompatibilityMatrixPROCESS in the first place, when it is not being run anywhere? If it's important, shouldn't we add it to some postcommit? > > > > > > Another solution I had in mind was reworking the `--environment_config` option. JSON blobs are unwieldy, and overloading the `--environment_config` option is confusing to the user. We could make each field in the PROCESS `--environment_config` blob its own argument, and then reject these arguments when `environment_type != PROCESS`. > > @mxm what do you think about https://issues.apache.org/jira/browse/BEAM-10671? I think it makes sense. Especially for error reporting by having dedicated argument parsers for all environments parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mxm edited a comment on pull request #12385: [BEAM-10527] Migrate Flink and Spark tests to pytest.
mxm edited a comment on pull request #12385: URL: https://github.com/apache/beam/pull/12385#issuecomment-672090815 > > Another solution I had in mind was reworking the `--environment_config` option. JSON blobs are unwieldy, and overloading the `--environment_config` option is confusing to the user. We could make each field in the PROCESS `--environment_config` blob its own argument, and then reject these arguments when `environment_type != PROCESS`. > > @mxm what do you think about https://issues.apache.org/jira/browse/BEAM-10671? I think it makes sense. Especially for error reporting by having dedicated argument parsers for all environments parameters. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] amaliujia commented on pull request #12532: [Beam-9543] support MATCH_RECOGNIZE with NFA
amaliujia commented on pull request #12532: URL: https://github.com/apache/beam/pull/12532#issuecomment-672126518 cc @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
tvalentyn commented on a change in pull request #12452: URL: https://github.com/apache/beam/pull/12452#discussion_r468768265 ## File path: CI.md ## @@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with regards of the jobs Those jobs often have matrix run strategy which runs several different variations of the jobs (with different platform type / Python version to run for example) +### Google Cloud Platform Credentials + +Some of the jobs require variables stored as a [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) +to perform operations on Google Cloud Platform. Currently these jobs are limited to Apache repository only. +These variables are: + * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository has it set to `apache-beam-testing`. Review comment: For my education, which repository we refer to here? Also, I would not hard code existing values in documentation, since the source-of-truth may be updated, and documentation may not. You could say: `For example apache-beam-testing` instead. ## File path: CI.md ## @@ -75,8 +75,28 @@ run categories. Here is a summary of the run categories with regards of the jobs Those jobs often have matrix run strategy which runs several different variations of the jobs (with different platform type / Python version to run for example) +### Google Cloud Platform Credentials + +Some of the jobs require variables stored as a [GitHub Secrets](https://docs.github.com/en/actions/configuring-and-managing-workflows/creating-and-storing-encrypted-secrets) +to perform operations on Google Cloud Platform. Currently these jobs are limited to Apache repository only. +These variables are: + * `GCP_PROJECT_ID` - ID of the Google Cloud project. Apache/Beam repository has it set to `apache-beam-testing`. + * `GCP_REGION` - Region of the bucket and dataflow jobs. Apache/Beam repository has it set to `us-central1`. + * `GCP_TESTING_BUCKET` - Name of the bucket where temporary files for Dataflow tests will be stored. Apache/Beam repository has it set to `beam-github-actions-tests`. + * `GCP_SA_EMAIL` - Service account email address. This is usually of the format `@.iam.gserviceaccount.com`. + * `GCP_SA_KEY` - Service account key. This key should be created and encoded as a Base64 string (eg. `cat my-key.json | base64` on macOS). + +Service Account shall have following permissions: + * Storage Admin (roles/storage.admin) + * Dataflow Admin (roles/dataflow.admin) + +### Workflows + + Build python source distribution and wheels - [build_wheels.yml](.github/workflows/build_wheels.yml) + | Job | Description | Pull Request Run | Direct Push/Merge Run | Scheduled Run | Requires GCP Credentials | |-||--|---|---|--| +| Check are GCP variables set | Checks are GCP variables are set. Jobs which required them depends on the output of this job. | Yes | Yes | Yes | Yes/No | Review comment: Suggestion: Consider using `Checks that GCP variables are set` or `Check GCP variables` throughout. ## File path: CI.md ## @@ -85,16 +105,15 @@ Those jobs often have matrix run strategy which runs several different variation | List files on Google Cloud Storage Bucket | Lists files on GCS for verification purpose. | - | Yes | Yes | Yes | | Tag repo nightly| Tag repo with `nightly-master` tag if build python source distribution and python wheels finished successfully. | -| - | Yes
[GitHub] [beam] boyuanzz commented on pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
boyuanzz commented on pull request #12518: URL: https://github.com/apache/beam/pull/12518#issuecomment-672158339 > Will it be possible to re-enable the Kafka test on direct runner? I don't think we have Kafka test on direct runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] danielxjd removed a comment on pull request #12223: [Beam-4379] Make ParquetIO read splittable
danielxjd removed a comment on pull request #12223: URL: https://github.com/apache/beam/pull/12223#issuecomment-672092490 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on a change in pull request #12531: [BEAM-10676] Use the fire timestamp as the output timestamp for timers
boyuanzz commented on a change in pull request #12531: URL: https://github.com/apache/beam/pull/12531#discussion_r468822213 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -647,7 +647,7 @@ def set(self, ts): windows=(self._window, ), clear_bit=False, fire_timestamp=ts, -hold_timestamp=self._input_timestamp, +hold_timestamp=ts, Review comment: Can we only use fire_timestamp as hold_timestamp when in event time domain? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on pull request #12534: [BEAM-9547] Implement some methods for deferred Series.
robertwb commented on pull request #12534: URL: https://github.com/apache/beam/pull/12534#issuecomment-672230522 R: @ibzib This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu merged pull request #12515: Upgrade to ZetaSQL 2020.08.1
robinyqiu merged pull request #12515: URL: https://github.com/apache/beam/pull/12515 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on a change in pull request #12492: [BEAM-6807] Implement an Azure blobstore filesystem for Python SDK
pabloem commented on a change in pull request #12492: URL: https://github.com/apache/beam/pull/12492#discussion_r468878930 ## File path: sdks/python/apache_beam/io/azure/blobstorageio_test.py ## @@ -0,0 +1,86 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tests for Azure Blob Storage client. +""" +# pytype: skip-file + +from __future__ import absolute_import + +import logging +import unittest + +from apache_beam.io.azure import blobstorageio Review comment: there's also an import error happening here. you need to catch it and skip the test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] saavannanavati commented on pull request #12352: [BEAM-10549] Improve runtime type checking performance for the Python SDK
saavannanavati commented on pull request #12352: URL: https://github.com/apache/beam/pull/12352#issuecomment-672292191 The microbenchmark results! It looks like `performance_runtime_type_check` is ~10% slower than no type check, but significantly faster than `runtime_type_check` whose time complexity is linear with respect to the input size. ```1 Element -- Simple Types 9.42 sec (No Type Check) 8.01 sec (Runtime Type Check) 7.70 sec (Performance Runtime Type Check) -- Nested Types 9.10 sec (No Type Check) 9.79 sec (Runtime Type Check) 7.59 sec (Performance Runtime Type Check) 1001 Elements -- Simple Types 11.78 sec (No Type Check) 65.04 sec (Runtime Type Check) 20.00 sec (Performance Runtime Type Check) -- Nested Types 20.26 sec (No Type Check) 74.79 sec (Runtime Type Check) 25.84 sec (Performance Runtime Type Check) 2001 Elements -- Simple Types 23.40 sec (No Type Check) 118.87 sec (Runtime Type Check) 25.75 sec (Performance Runtime Type Check) -- Nested Types 32.52 sec (No Type Check) 135.38 sec (Runtime Type Check) 36.19 sec (Performance Runtime Type Check) 3001 Elements -- Simple Types 29.79 sec (No Type Check) 174.64 sec (Runtime Type Check) 24.85 sec (Performance Runtime Type Check) -- Nested Types 42.14 sec (No Type Check) 193.98 sec (Runtime Type Check) 45.33 sec (Performance Runtime Type Check) 4001 Elements -- Simple Types 34.66 sec (No Type Check) 225.57 sec (Runtime Type Check) 37.28 sec (Performance Runtime Type Check) -- Nested Types 51.87 sec (No Type Check) 260.55 sec (Runtime Type Check) 56.28 sec (Performance Runtime Type Check) 5001 Elements -- Simple Types 41.14 sec (No Type Check) 277.93 sec (Runtime Type Check) 31.16 sec (Performance Runtime Type Check) -- Nested Types 63.14 sec (No Type Check) 296.25 sec (Runtime Type Check) 41.02 sec (Performance Runtime Type Check) 6001 Elements -- Simple Types 30.41 sec (No Type Check) 292.06 sec (Runtime Type Check) 50.29 sec (Performance Runtime Type Check) -- Nested Types 43.29 sec (No Type Check) 378.11 sec (Runtime Type Check) 75.43 sec (Performance Runtime Type Check) 7001 Elements -- Simple Types 42.78 sec (No Type Check) 288.78 sec (Runtime Type Check) 39.87 sec (Performance Runtime Type Check) -- Nested Types 54.35 sec (No Type Check) 318.59 sec (Runtime Type Check) 56.68 sec (Performance Runtime Type Check) 8001 Elements -- Simple Types 41.48 sec (No Type Check) 362.18 sec (Runtime Type Check) 62.33 sec (Performance Runtime Type Check) -- Nested Types 60.65 sec (No Type Check) 501.25 sec (Runtime Type Check) 96.24 sec (Performance Runtime Type Check) 9001 Elements -- Simple Types 64.16 sec (No Type Check) 370.09 sec (Runtime Type Check) 62.71 sec (Performance Runtime Type Check) -- Nested Types 83.78 sec (No Type Check) 557.60 sec (Runtime Type Check) 107.29 sec (Performance Runtime Type Check)``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kkucharc commented on a change in pull request #12527: [BEAM-10672] Added Python Combine load tests for streaming on Dataflow
kkucharc commented on a change in pull request #12527: URL: https://github.com/apache/beam/pull/12527#discussion_r468725264 ## File path: .test-infra/jenkins/job_LoadTests_Combine_Python.groovy ## @@ -93,15 +93,23 @@ def loadTestConfigurations = { datasetName -> top_count: 20, ] ], - ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } + ] + .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) } + .each{ test -> (jobType!='streaming') ?: addStreamingOptions(test) } Review comment: Sure, thanks :) ## File path: .test-infra/jenkins/job_LoadTests_Combine_Python.groovy ## @@ -113,13 +121,31 @@ PhraseTriggeringPostCommitBuilder.postCommitJob( this ) { additionalPipelineArgs = [:] - batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.PR) + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, "batch") } CronJobBuilder.cronJob('beam_LoadTests_Python_Combine_Dataflow_Batch', 'H 15 * * *', this) { additionalPipelineArgs = [ influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostname, ] - batchLoadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT) + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, "batch") } + +PhraseTriggeringPostCommitBuilder.postCommitJob( +'beam_LoadTests_Python_Combine_Dataflow_Streaming', +'Run Python Load Tests Combine Dataflow Streaming', +'Load Tests Python Combine Dataflow Streaming suite', +this +) { + additionalPipelineArgs = [:] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.PR, "streaming") +} + +CronJobBuilder.cronJob('beam_LoadTests_Python_Combine_Dataflow_Streaming', 'H 15 * * *', this) { + additionalPipelineArgs = [ +influx_db_name: InfluxDBCredentialsHelper.InfluxDBDatabaseName, +influx_hostname: InfluxDBCredentialsHelper.InfluxDBHostname, + ] + loadTestJob(delegate, CommonTestProperties.TriggeringContext.POST_COMMIT, "streaming") +} Review comment: Sure, thanks a lot :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12498: [BEAM-10654] Implemented ExternalSchemaIOTransformRegistrar for jdbc
TheNeuralBit commented on a change in pull request #12498: URL: https://github.com/apache/beam/pull/12498#discussion_r468722620 ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.jdbc; + +import com.google.auto.service.AutoService; +import java.io.Serializable; +import java.util.List; +import java.util.stream.Collectors; +import java.util.stream.StreamSupport; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.schemas.io.SchemaIO; +import org.apache.beam.sdk.schemas.io.SchemaIOProvider; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; + +/** + * An implementation of {@link SchemaIOProvider} for reading and writing JSON payloads with {@link + * JdbcIO}. + */ +@Internal +@AutoService(SchemaIOProvider.class) +public class JdbcSchemaIOProvider implements SchemaIOProvider { + + /** Returns an id that uniquely represents this IO. */ + @Override + public String identifier() { +return "jdbc"; + } + + /** + * Returns the expected schema of the configuration object. Note this is distinct from the schema + * of the data source itself. + */ + @Override + public Schema configurationSchema() { +return Schema.builder() +.addStringField("driverClassName") +.addStringField("jdbcUrl") +.addStringField("username") +.addStringField("password") +.addNullableField("connectionProperties", FieldType.STRING) +.addNullableField("connectionInitSqls", FieldType.iterable(FieldType.STRING)) +.addNullableField("readQuery", FieldType.STRING) +.addNullableField("writeStatement", FieldType.STRING) +.addNullableField("fetchSize", FieldType.INT16) +.addNullableField("outputParallelization", FieldType.BOOLEAN) +.build(); + } + + /** + * Produce a SchemaIO given a String representing the data's location, the schema of the data that + * resides there, and some IO-specific configuration object. + */ + @Override + public JdbcSchemaIO from(String location, Row configuration, Schema dataSchema) { +return new JdbcSchemaIO(location, configuration); + } + + @Override + public boolean requiresDataSchema() { +return false; + } + + @Override + public PCollection.IsBounded isBounded() { +return PCollection.IsBounded.BOUNDED; + } + + /** An abstraction to create schema aware IOs. */ + static class JdbcSchemaIO implements SchemaIO, Serializable { +protected final Row config; +protected final String location; + +JdbcSchemaIO(String location, Row config) { + this.config = config; + this.location = location; +} + +@Override +public Schema schema() { + return null; +} + +@Override +public PTransform> buildReader() { + String readQuery; + if (config.getString("readQuery") != null) { +readQuery = config.getString("readQuery"); + } else { +readQuery = String.format("SELECT f_int FROM %s", location); Review comment: Similarly this shouldn't mention specific field names. In this case I think we could just do `SELECT * FROM %s` ## File path: sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcSchemaIOProvider.java ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by
[GitHub] [beam] danielxjd removed a comment on pull request #12223: [Beam-4379] Make ParquetIO read splittable
danielxjd removed a comment on pull request #12223: URL: https://github.com/apache/beam/pull/12223#issuecomment-670715872 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tysonjh commented on pull request #12499: [BEAM-10602] Fix load test metrics in Grafana dashboard
tysonjh commented on pull request #12499: URL: https://github.com/apache/beam/pull/12499#issuecomment-672113428 > @tysonjh You should be able to run this locally with the backup data which is automatically retrieved from the GCS bucket when you run `docker-compose up`. Basically, the changes here will restore the old behavior + add a latency/checkpoint panel + combine the Flink ParDo results with the ones from Dataflow/Spark in one panel. Got it, thanks. I had to add some extra steps to the wiki for setting up the InfluxDB Data Source and now have graphs showing up. Please let me know when the comments are resolved so I can take another look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #12481: [BEAM-10571] Use schemas in ExternalConfigurationPayload
TheNeuralBit commented on pull request #12481: URL: https://github.com/apache/beam/pull/12481#issuecomment-672186752 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] emilymye commented on pull request #12505: [WIP][BEAM-8106] Add version to java container image name
emilymye commented on pull request #12505: URL: https://github.com/apache/beam/pull/12505#issuecomment-672204447 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb opened a new pull request #12534: [BEAM-9547] Implement some methods for deferred Series.
robertwb opened a new pull request #12534: URL: https://github.com/apache/beam/pull/12534 Now less than 50% of the pandas doctests are skipped. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](htt ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[GitHub] [beam] aaltay merged pull request #12529: Moving /tmp directory cleanup of CI workers to Inventory Jenkins job
aaltay merged pull request #12529: URL: https://github.com/apache/beam/pull/12529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] Ardagan commented on pull request #12139: [DO NOT REVIEW] Scd py bq test
Ardagan commented on pull request #12139: URL: https://github.com/apache/beam/pull/12139#issuecomment-672253789 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz merged pull request #12518: [BEAM-10663] Workaround of AutoValueSchema doesn't work with SchemaFieldName
boyuanzz merged pull request #12518: URL: https://github.com/apache/beam/pull/12518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #12526: [BEAM-10663] Disable python kafka integration tests
tvalentyn commented on pull request #12526: URL: https://github.com/apache/beam/pull/12526#issuecomment-672258012 Thanks everyone for reporting and fixing this error. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit opened a new pull request #12535: [BEAM-10500] Make KeyedTimerDataCoder encode output timestamp
TheNeuralBit opened a new pull request #12535: URL: https://github.com/apache/beam/pull/12535 KeyedTimerDataCoder doesn't actually encode the outputTimestamp separately, instead re-using the timestamp when decoding. This leads to occasional flakes when the two `new Instant()` instances created in the test are different. Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](htt ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_ Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
[GitHub] [beam] amaliujia commented on pull request #12515: Upgrade to ZetaSQL 2020.08.1
amaliujia commented on pull request #12515: URL: https://github.com/apache/beam/pull/12515#issuecomment-672271568 Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #12506: [BEAM-9680] Add Filter with ParDo lesson to Go SDK Katas
damondouglas commented on pull request #12506: URL: https://github.com/apache/beam/pull/12506#issuecomment-672365280 @lostluck the [stepik course](https://stepik.org/course/70387) has been updated and the `*-remote.yaml` files committed to this PR. This PR is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #12489: [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions
pabloem commented on pull request #12489: URL: https://github.com/apache/beam/pull/12489#issuecomment-672370947 Run Python 3.8 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu merged pull request #12536: [BEAM-10611] Simplification: Use new ZetaSQL API to get/create Values
robinyqiu merged pull request #12536: URL: https://github.com/apache/beam/pull/12536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robinyqiu opened a new pull request #12539: Fix some typos
robinyqiu opened a new pull request #12539: URL: https://github.com/apache/beam/pull/12539 Fix some typos found during code import to Google codebase. r: @aaltay @kennknowles Post-Commit Tests Status (on master branch) Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2 --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | --- Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](htt ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/) Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_ Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
[GitHub] [beam] boyuanzz commented on a change in pull request #12519: [BEAM-10670] Make Read execute as a splittable DoFn by default for the Java DirectRunner.
boyuanzz commented on a change in pull request #12519: URL: https://github.com/apache/beam/pull/12519#discussion_r468874809 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java ## @@ -211,10 +211,6 @@ public FinishBundleContext finishBundleContext(DoFn doFn) { KV> residual = processContext.getTakenCheckpoint(); if (cont.shouldResume()) { - checkState( Review comment: Is it because we have `checkDone` now? ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java ## @@ -630,4 +635,38 @@ public void tearDown() { invoker = null; } } + + /** + * Throws an {@link IllegalArgumentException} if the pipeline contains any primitive read + * transforms that have not been expanded to be executed as {@link DoFn splittable DoFns}. + */ + public static void validateNoPrimitiveReads(Pipeline pipeline) { +pipeline.traverseTopologically(new ValidateNoPrimitiveReads()); + } + + /** + * A {@link org.apache.beam.sdk.Pipeline.PipelineVisitor} that ensures that the pipeline does not + * contain any primitive reads. Review comment: ```suggestion * contain any primitive reads when use_deprecated_read is not specified. ``` ## File path: runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java ## @@ -179,6 +180,11 @@ public DirectPipelineResult run(Pipeline pipeline) { DisplayDataValidator.validatePipeline(pipeline); DisplayDataValidator.validateOptions(options); + // TODO(BEAM-10670): Remove the deprecated Read and make the splittable DoFn the only option. + if (!(ExperimentalOptions.hasExperiment(options, "beam_fn_api_use_deprecated_read") Review comment: Is it possible to make `beam_fn_api_use_deprecated_read` and `use_deprecated_read` into one `use_deprecated_read` since they seem to the same. ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/SplittableParDoTest.java ## @@ -167,4 +180,55 @@ public void testBoundednessForUnboundedFn() { "unbounded to unbounded", makeUnboundedCollection(pipeline), unboundedFn) .isBounded()); } + + private static class FakeBoundedSource extends BoundedSource { +@Override +public List> split( +long desiredBundleSizeBytes, PipelineOptions options) throws Exception { + return Collections.singletonList(this); +} + +@Override +public long getEstimatedSizeBytes(PipelineOptions options) throws Exception { + return 0; +} + +@Override +public BoundedReader createReader(PipelineOptions options) throws IOException { + throw new UnsupportedOperationException(); +} + +@Override +public Coder getOutputCoder() { + return StringUtf8Coder.of(); +} + } + + @Test + public void testValidateThatThereAreNoPrimitiveReads() { Review comment: Can we add one block to test using `use_deprecated_read ` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] KevinGG commented on pull request #12444: Added a whitespace lint as part of python lint precommit
KevinGG commented on pull request #12444: URL: https://github.com/apache/beam/pull/12444#issuecomment-672309994 > Few things regarding this change > > 1. Isn't the spotless pre-commit right place to extend with such checks? Many developers probably already added that to their pre-commit hooks so the introduction of this check will be almost unnoticeable. It will be the best if the check is unnoticeable. This check can also get triggered when a commit only changes some markdown or build.gradle files in the repo. It's not needed if the commit just changes code since different linters in those programming languages already covers the whitespace issues in code files. > 2. Seems like it's using python2.7 virtualenv [1]: > > ``` > 00:00:40.125 Running virtualenv with interpreter /usr/bin/python2.7 > 00:00:40.425 DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. > ``` > > If possible, I would suggest avoiding that due to the end of life. > Changing it to use python3.8. > 1. The check is really "quiet" in terms of logging, in case of failure there are no details regarding the file/line and the exact issue. Check out my sample PR #12470 and the build output [2]. Would be helpful if it prints the issue similarly to the spottles e.g. [3]. > > [1] https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Commit/1/console > [2] https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Commit/3/console > [3] https://ci-beam.apache.org/job/beam_PreCommit_Spotless_Commit/10575/console Yes, it doesn't print much except `Running whitespacelint...` if it succeeds. But if it fails, it does tell you which line and what problem it is. The [2] actually fails because it couldn't find the precommit task: ` Task 'whitespacePreCommit' not found in root project 'beam'.` I'll modify the script to output more information. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12297: [BEAM-10137] Add KinesisIO for cross-language usage with python wrapper
TheNeuralBit commented on a change in pull request #12297: URL: https://github.com/apache/beam/pull/12297#discussion_r468907524 ## File path: sdks/python/apache_beam/io/kinesis.py ## @@ -0,0 +1,317 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""PTransforms for supporting Kinesis streaming in Python pipelines. + + These transforms are currently supported by Beam Flink and Spark portable + runners. + + **Setup** + + Transforms provided in this module are cross-language transforms + implemented in the Beam Java SDK. During the pipeline construction, Python SDK + will connect to a Java expansion service to expand these transforms. + To facilitate this, a small amount of setup is needed before using these + transforms in a Beam Python pipeline. + + There are several ways to setup cross-language Kinesis transforms. + + * Option 1: use the default expansion service + * Option 2: specify a custom expansion service + + See below for details regarding each of these options. + + *Option 1: Use the default expansion service* + + This is the recommended and easiest setup option for using Python Kinesis + transforms. This option is only available for Beam 2.24.0 and later. + + This option requires following pre-requisites before running the Beam + pipeline. + + * Install Java runtime in the computer from where the pipeline is constructed +and make sure that 'java' command is available. + + In this option, Python SDK will either download (for released Beam version) or + build (when running from a Beam Git clone) a expansion service jar and use + that to expand transforms. Currently Kinesis transforms use the + 'beam-sdks-java-io-kinesis-expansion-service' jar for this purpose. + + *Option 2: specify a custom expansion service* + + In this option, you startup your own expansion service and provide that as + a parameter when using the transforms provided in this module. + + This option requires following pre-requisites before running the Beam + pipeline. + + * Startup your own expansion service. + * Update your pipeline to provide the expansion service address when +initiating Kinesis transforms provided in this module. + + Flink Users can use the built-in Expansion Service of the Flink Runner's + Job Server. If you start Flink's Job Server, the expansion service will be + started on port 8097. For a different address, please set the + expansion_service parameter. + + **More information** + + For more information regarding cross-language transforms see: + - https://beam.apache.org/roadmap/portability/ + + For more information specific to Flink runner see: + - https://beam.apache.org/documentation/runners/flink/ +""" + +# pytype: skip-file + +from __future__ import absolute_import + +import time +from typing import List +from typing import NamedTuple +from typing import Optional +from typing import Tuple + +from past.builtins import unicode + +from apache_beam import BeamJarExpansionService +from apache_beam import ExternalTransform +from apache_beam import NamedTupleBasedPayloadBuilder + +__all__ = [ +'WriteToKinesis', +'ReadDataFromKinesis', +'InitialPositionInStream', +'WatermarkPolicy', +] + + +def default_io_expansion_service(): + return BeamJarExpansionService( + 'sdks:java:io:kinesis:expansion-service:shadowJar') + + +WriteToKinesisSchema = NamedTuple( +'WriteToKinesisSchema', +[ +('stream_name', unicode), +('aws_access_key', unicode), +('aws_secret_key', unicode), +('region', unicode), +('partition_key', unicode), +('service_endpoint', Optional[unicode]), +('verify_certificate', Optional[bool]), +('producer_properties', Optional[List[Tuple[unicode, unicode]]]), Review comment: #12481 is merged now so this should change to mapping (as well as producer_properties, and the corresponding parameters in Java) ## File path: sdks/python/apache_beam/io/kinesis.py ## @@ -0,0 +1,317 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright
[GitHub] [beam] TobKed commented on pull request #12452: [BEAM-10623] Add workflow to run Beam python tests on Linux/Windows/Mac platforms
TobKed commented on pull request #12452: URL: https://github.com/apache/beam/pull/12452#issuecomment-672344986 > Thanks, @TobKed , I don't have additional comments. Once all reviewers, whose feedback you expect, give their LGTM, please squash fixup commits, and leave only the commits that you want to be added to the commit history. I found I missed to use `TemporaryDirectory()` in `sdks/python/apache_beam/typehints/typecheck_test_py3.py`, I fixed it. I also added input to manual trigger so dataflow tests have to be explicitly enabled (made them disabled by default). Thanks for reviews @tvalentyn and all people involved in this PR. I will create PR to Apache Infra with request for setting up proper secrets. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a change in pull request #12297: [BEAM-10137] Add KinesisIO for cross-language usage with python wrapper
TheNeuralBit commented on a change in pull request #12297: URL: https://github.com/apache/beam/pull/12297#discussion_r468925737 ## File path: sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisTransformRegistrar.java ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.kinesis; + +import com.amazonaws.regions.Regions; +import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream; +import com.google.auto.service.AutoService; +import java.util.Map; +import java.util.Properties; +import javax.annotation.Nullable; Review comment: We should prefer the checker framework Nullable annotation, `org.checkerframework.checker.nullness.qual.Nullable` ## File path: sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py ## @@ -0,0 +1,308 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Integration test for Python cross-language pipelines for Java KinesisIO. + +If you want to run the tests on localstack then run it just with pipeline +options. + +To test it on a real AWS account you need to pass some additional params, e.g.: +python setup.py nosetests \ +--tests=apache_beam.io.external.xlang_kinesisio_it_test \ +--test-pipeline-options=" + --use_real_aws + --aws_kinesis_stream= + --aws_access_key= + --aws_secret_key= + --aws_region= + --runner=FlinkRunner" +""" + +# pytype: skip-file + +from __future__ import absolute_import + +import argparse +import logging +import time +import unittest +import uuid + +import apache_beam as beam +from apache_beam.io.kinesis import InitialPositionInStream +from apache_beam.io.kinesis import ReadDataFromKinesis +from apache_beam.io.kinesis import WriteToKinesis +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.options.pipeline_options import StandardOptions +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to + +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + import boto3 +except ImportError: + boto3 = None + +try: + from testcontainers.core.container import DockerContainer +except ImportError: + DockerContainer = None +# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports + +LOCALSTACK_VERSION = '0.11.3' +NUM_RECORDS = 10 +NOW = time.time() +RECORD = b'record' + str(uuid.uuid4()).encode() + + +@unittest.skipUnless(DockerContainer, 'testcontainers is not installed.') +@unittest.skipUnless(boto3, 'boto3 is not installed.') +@unittest.skipUnless( +TestPipeline().get_pipeline_options().view_as(StandardOptions).runner, +'Do not run this test on precommit suites.') +class CrossLanguageKinesisIOTest(unittest.TestCase): + @unittest.skipUnless( + TestPipeline().get_option('aws_kinesis_stream'), + 'Cannot test on real aws without pipeline options provided') + def test_kinesis_io_roundtrip(self): +# TODO: enable this test for localstack once BEAM-10664 is resolved +self.run_kinesis_write() +self.run_kinesis_read() + + @unittest.skipIf( + TestPipeline().get_option('aws_kinesis_stream'), + 'Do not test on localstack when pipeline options were provided') + def test_kinesis_write(self): +# TODO: remove this test once BEAM-10664 is resolved +self.run_kinesis_write() +records =
[GitHub] [beam] chamikaramj commented on a change in pull request #12263: [BEAM-10492] Add missing sideinput handling to DLP transforms
chamikaramj commented on a change in pull request #12263: URL: https://github.com/apache/beam/pull/12263#discussion_r468930529 ## File path: sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/DLPDeidentifyText.java ## @@ -177,19 +177,24 @@ public DLPDeidentifyText build() { @Override public PCollection> expand( PCollection> input) { -return input -.apply(ParDo.of(new MapStringToDlpRow(getColumnDelimiter( -.apply("Batch Contents", ParDo.of(new BatchRequestForDLP(getBatchSizeBytes( -.apply( -"DLPDeidentify", + +ParDo.SingleOutput>, KV> +deidentifyParDo = ParDo.of( new DeidentifyText( getProjectId(), getInspectTemplateName(), getDeidentifyTemplateName(), getInspectConfig(), getDeidentifyConfig(), -getHeaderColumns(; +getHeaderColumns())); +if (getHeaderColumns() != null) { Review comment: It was just a suggestion. I'm OK with merging as it is if that's OK with the author of the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org