[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   82.33%   
   =
 Files ?  453   
 Lines ?54054   
 Branches  ?0   
   =
 Hits  ?44507   
 Misses? 9547   
 Partials  ?0   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/version.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdmVyc2lvbi5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/transforms/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9fX2luaXRfXy5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[...s/python/apache\_beam/examples/wordcount\_minimal.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X21pbmltYWwucHk=)
 | `93.33% <0.00%> (ø)` | |
   | 
[...on/apache\_beam/portability/api/metrics\_pb2\_grpc.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL21ldHJpY3NfcGIyX2dycGMucHk=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ache\_beam/examples/cookbook/datastore\_wordcount.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29va2Jvb2svZGF0YXN0b3JlX3dvcmRjb3VudC5weQ==)
 | `32.72% <0.00%> (ø)` | |
   | 
[...dks/python/apache\_beam/transforms/external\_java.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9leHRlcm5hbF9qYXZhLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...pache\_beam/typehints/trivial\_inference\_test\_py3.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3RyaXZpYWxfaW5mZXJlbmNlX3Rlc3RfcHkzLnB5)
 | `92.85% <0.00%> (ø)` | |
   | 
[...on/apache\_beam/runners/direct/watermark\_manager.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvd2F0ZXJtYXJrX21hbmFnZXIucHk=)
 | `97.17% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/internal/pickler.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvcGlja2xlci5weQ==)
 | `85.82% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/internal/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvX19pbml0X18ucHk=)
 | `100.00% <0.00%> (ø)` | |
   | ... and [443 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...85c579f](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   82.33%   
   =
 Files ?  453   
 Lines ?54054   
 Branches  ?0   
   =
 Hits  ?44507   
 Misses? 9547   
 Partials  ?0   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/version.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdmVyc2lvbi5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/transforms/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9fX2luaXRfXy5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[...s/python/apache\_beam/examples/wordcount\_minimal.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X21pbmltYWwucHk=)
 | `93.33% <0.00%> (ø)` | |
   | 
[...on/apache\_beam/portability/api/metrics\_pb2\_grpc.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL21ldHJpY3NfcGIyX2dycGMucHk=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ache\_beam/examples/cookbook/datastore\_wordcount.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29va2Jvb2svZGF0YXN0b3JlX3dvcmRjb3VudC5weQ==)
 | `32.72% <0.00%> (ø)` | |
   | 
[...dks/python/apache\_beam/transforms/external\_java.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9leHRlcm5hbF9qYXZhLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...pache\_beam/typehints/trivial\_inference\_test\_py3.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3RyaXZpYWxfaW5mZXJlbmNlX3Rlc3RfcHkzLnB5)
 | `92.85% <0.00%> (ø)` | |
   | 
[...on/apache\_beam/runners/direct/watermark\_manager.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3Qvd2F0ZXJtYXJrX21hbmFnZXIucHk=)
 | `97.17% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/internal/pickler.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvcGlja2xlci5weQ==)
 | `85.82% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/internal/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvX19pbml0X18ucHk=)
 | `100.00% <0.00%> (ø)` | |
   | ... and [443 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...85c579f](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492475183



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/RecordWithMetadata.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import com.google.auto.value.AutoValue;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.AutoValueSchema;
+import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
+
+/**
+ * Helper Class based on {@link AutoValueSchema}, it provides Metadata 
associated with each Record
+ * when reading from file(s) using {@link ContextualTextIO}.
+ *
+ * Fields:
+ *
+ * 
+ *   recordOffset: The offset of a record (the byte at which the record 
begins) in a file. This
+ *   information can be useful if you wish to reconstruct the file. {@link
+ *   RecordWithMetadata#getRecordOffset()}
+ *   recordNum: The ordinal number of the record in its file. {@link
+ *   RecordWithMetadata#getRecordNum()}
+ *   recordValue: The value / contents of the record {@link 
RecordWithMetadata#getRecordValue()}
+ *   rangeOffset: The starting offset of the range (split), which 
contained the record, when the
+ *   record was read. {@link RecordWithMetadata#getRangeOffset()}
+ *   recordNumInOffset: The record number relative to the Range. (line 
number within the range)
+ *   {@link RecordWithMetadata#getRecordNumInOffset()}
+ *   fileName: Name of the file to which the record belongs (this is the 
full filename,
+ *   eg:path/to/file.txt) {@link RecordWithMetadata#getFileName()}
+ * 
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+@DefaultSchema(AutoValueSchema.class)
+@AutoValue
+public abstract class RecordWithMetadata {
+  public abstract Long getRecordOffset();

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12884: [BEAM-7746] Add type checking to coders

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12884:
URL: https://github.com/apache/beam/pull/12884#issuecomment-696510544


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12884?src=pr=h1) Report
   > Merging 
[#12884](https://codecov.io/gh/apache/beam/pull/12884?src=pr=desc) into 
[master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `83.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12884/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12884?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master   #12884  +/-   ##
   ==
   - Coverage   82.34%   82.33%   -0.02% 
   ==
 Files 452  453   +1 
 Lines   5401654058  +42 
   ==
   + Hits4448144509  +28 
   - Misses   9535 9549  +14 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12884?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=)
 | `95.13% <75.00%> (-0.12%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/coders/coders.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVycy5weQ==)
 | `85.41% <100.00%> (+0.02%)` | :arrow_up: |
   | 
[.../python/apache\_beam/transforms/periodicsequence.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9wZXJpb2RpY3NlcXVlbmNlLnB5)
 | `96.49% <0.00%> (-1.76%)` | :arrow_down: |
   | 
[...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5)
 | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5)
 | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=)
 | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | 
[...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==)
 | `88.98% <0.00%> (-0.36%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/io/iobase.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vaW9iYXNlLnB5)
 | `84.04% <0.00%> (-0.29%)` | :arrow_down: |
   | 
[...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==)
 | `94.45% <0.00%> (-0.14%)` | :arrow_down: |
   | 
[...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==)
 | `91.66% <0.00%> (ø)` | |
   | ... and [1 
more](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12884?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12884?src=pr=footer). Last 
update 
[067cba8...e2952c5](https://codecov.io/gh/apache/beam/pull/12884?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #12895: [BEAM-9372][BEAM-9980] Switches Flink VR suite to Py36 and makes the version configurable.

2020-09-21 Thread GitBox


tvalentyn merged pull request #12895:
URL: https://github.com/apache/beam/pull/12895


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492480914



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOSource.java
##
@@ -0,0 +1,364 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.SeekableByteChannel;
+import java.util.NoSuchElementException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileBasedSource;
+import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
+import org.apache.beam.sdk.io.fs.MatchResult;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.schemas.NoSuchSchemaException;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.schemas.SchemaRegistry;
+import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import org.checkerframework.checker.nullness.qual.Nullable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Implementation detail of {@link ContextualTextIO.Read}.
+ *
+ * A {@link FileBasedSource} which can decode records delimited by newline 
characters.
+ *
+ * This source splits the data into records using {@code UTF-8} {@code \n}, 
{@code \r}, or {@code
+ * \r\n} as the delimiter. This source is not strict and supports decoding the 
last record even if
+ * it is not delimited. Finally, no records are decoded if the stream is empty.
+ *
+ * This source supports reading from any arbitrary byte position within the 
stream. If the
+ * starting position is not {@code 0}, then bytes are skipped until the first 
delimiter is found
+ * representing the beginning of the first record to be decoded.
+ */
+@VisibleForTesting
+class ContextualTextIOSource extends FileBasedSource {
+  byte[] delimiter;
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ContextualTextIOSource.class);
+
+  // Used to Override isSplittable
+  private boolean hasMultilineCSVRecords;
+
+  @Override
+  protected boolean isSplittable() throws Exception {
+if (hasMultilineCSVRecords) {
+  // When Having Multiline CSV Records,
+  // Splitting the file may cause a split to be within a record,
+  // Disabling split prevents this from happening
+  return false;
+}
+return super.isSplittable();
+  }
+
+  ContextualTextIOSource(
+  ValueProvider fileSpec,
+  EmptyMatchTreatment emptyMatchTreatment,
+  byte[] delimiter,
+  boolean hasMultilineCSVRecords) {
+super(fileSpec, emptyMatchTreatment, 1L);
+this.delimiter = delimiter;
+this.hasMultilineCSVRecords = hasMultilineCSVRecords;
+  }
+
+  private ContextualTextIOSource(
+  MatchResult.Metadata metadata,
+  long start,
+  long end,
+  byte[] delimiter,
+  boolean hasMultilineCSVRecords) {
+super(metadata, 1L, start, end);
+this.delimiter = delimiter;
+this.hasMultilineCSVRecords = hasMultilineCSVRecords;
+  }
+
+  @Override
+  protected FileBasedSource createForSubrangeOfFile(
+  MatchResult.Metadata metadata, long start, long end) {
+return new ContextualTextIOSource(metadata, start, end, delimiter, 
hasMultilineCSVRecords);
+  }
+
+  @Override
+  protected FileBasedReader 
createSingleFileReader(PipelineOptions options) {
+return new MultiLineTextBasedReader(this, delimiter, 
hasMultilineCSVRecords);
+  }
+
+  @Override
+  public Coder getOutputCoder() {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492480890



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/RecordWithMetadata.java
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import com.google.auto.value.AutoValue;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.schemas.AutoValueSchema;
+import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
+
+/**
+ * Helper Class based on {@link AutoValueSchema}, it provides Metadata 
associated with each Record
+ * when reading from file(s) using {@link ContextualTextIO}.
+ *
+ * Fields:
+ *
+ * 
+ *   recordOffset: The offset of a record (the byte at which the record 
begins) in a file. This
+ *   information can be useful if you wish to reconstruct the file. {@link
+ *   RecordWithMetadata#getRecordOffset()}
+ *   recordNum: The ordinal number of the record in its file. {@link
+ *   RecordWithMetadata#getRecordNum()}
+ *   recordValue: The value / contents of the record {@link 
RecordWithMetadata#getValue()}
+ *   rangeOffset: The starting offset of the range (split), which 
contained the record, when the
+ *   record was read. {@link RecordWithMetadata#getRangeOffset()}
+ *   recordNumInOffset: The record number relative to the Range. (line 
number within the range)
+ *   {@link RecordWithMetadata#getRecordNumInOffset()}
+ *   fileName: Name of the file to which the record belongs (this is the 
full filename,
+ *   eg:path/to/file.txt) {@link RecordWithMetadata#getFileName()}
+ * 
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+@DefaultSchema(AutoValueSchema.class)
+@AutoValue
+public abstract class RecordWithMetadata {
+  public abstract long getRecordOffset();

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   82.33%   
   =
 Files ?  453   
 Lines ?54054   
 Branches  ?0   
   =
 Hits  ?44506   
 Misses? 9548   
 Partials  ?0   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/io/jdbc.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vamRiYy5weQ==)
 | `86.36% <0.00%> (ø)` | |
   | 
[...ache\_beam/portability/api/beam\_artifact\_api\_pb2.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fYXJ0aWZhY3RfYXBpX3BiMi5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[...ython/apache\_beam/io/gcp/datastore/v1new/helper.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2RhdGFzdG9yZS92MW5ldy9oZWxwZXIucHk=)
 | `86.20% <0.00%> (ø)` | |
   | 
[.../python/apache\_beam/testing/benchmarks/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL19faW5pdF9fLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/examples/avro\_bitcoin.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvYXZyb19iaXRjb2luLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...beam/testing/benchmarks/nexmark/models/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbW9kZWxzL19faW5pdF9fLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...hon/apache\_beam/examples/wordcount\_with\_metrics.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X3dpdGhfbWV0cmljcy5weQ==)
 | `28.12% <0.00%> (ø)` | |
   | 
[...ks/python/apache\_beam/io/gcp/pubsub\_it\_pipeline.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL3B1YnN1Yl9pdF9waXBlbGluZS5weQ==)
 | `20.58% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/io/gcp/dicomio.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2RpY29taW8ucHk=)
 | `94.92% <0.00%> (ø)` | |
   | 
[...s/python/apache\_beam/portability/api/schema\_pb2.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL3NjaGVtYV9wYjIucHk=)
 | `100.00% <0.00%> (ø)` | |
   | ... and [443 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...85c579f](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   82.33%   
   =
 Files ?  453   
 Lines ?54054   
 Branches  ?0   
   =
 Hits  ?44506   
 Misses? 9548   
 Partials  ?0   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[...hon/apache\_beam/examples/wordcount\_with\_metrics.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X3dpdGhfbWV0cmljcy5weQ==)
 | `28.12% <0.00%> (ø)` | |
   | 
[...dks/python/apache\_beam/transforms/create\_source.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jcmVhdGVfc291cmNlLnB5)
 | `98.18% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/internal/pickler.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvcGlja2xlci5weQ==)
 | `85.82% <0.00%> (ø)` | |
   | 
[.../apache\_beam/examples/cookbook/group\_with\_coder.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29va2Jvb2svZ3JvdXBfd2l0aF9jb2Rlci5weQ==)
 | `84.44% <0.00%> (ø)` | |
   | 
[...s/python/apache\_beam/testing/pipeline\_verifiers.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9waXBlbGluZV92ZXJpZmllcnMucHk=)
 | `91.80% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/examples/snippets/util.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdXRpbC5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[...ython/apache\_beam/runners/direct/direct\_metrics.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvZGlyZWN0X21ldHJpY3MucHk=)
 | `98.38% <0.00%> (ø)` | |
   | 
[...ks/python/apache\_beam/io/gcp/datastore/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2RhdGFzdG9yZS9fX2luaXRfXy5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/io/external/kafka.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZXh0ZXJuYWwva2Fma2EucHk=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...apache\_beam/examples/cookbook/custom\_ptransform.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvY29va2Jvb2svY3VzdG9tX3B0cmFuc2Zvcm0ucHk=)
 | `53.48% <0.00%> (ø)` | |
   | ... and [443 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...85c579f](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   82.33%   
   =
 Files ?  453   
 Lines ?54054   
 Branches  ?0   
   =
 Hits  ?44506   
 Misses? 9548   
 Partials  ?0   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[...s/python/apache\_beam/testing/pipeline\_verifiers.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9waXBlbGluZV92ZXJpZmllcnMucHk=)
 | `91.80% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/io/gcp/tests/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL3Rlc3RzL19faW5pdF9fLnB5)
 | `100.00% <0.00%> (ø)` | |
   | 
[...dks/python/apache\_beam/io/gcp/gce\_metadata\_util.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2djZV9tZXRhZGF0YV91dGlsLnB5)
 | `83.33% <0.00%> (ø)` | |
   | 
[...hon/apache\_beam/examples/wordcount\_with\_metrics.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X3dpdGhfbWV0cmljcy5weQ==)
 | `28.12% <0.00%> (ø)` | |
   | 
[...ache\_beam/runners/portability/local\_job\_service.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9sb2NhbF9qb2Jfc2VydmljZS5weQ==)
 | `81.16% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/examples/flink/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvZmxpbmsvX19pbml0X18ucHk=)
 | `0.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/internal/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvX19pbml0X18ucHk=)
 | `100.00% <0.00%> (ø)` | |
   | 
[...thon/apache\_beam/io/azure/blobstoragefilesystem.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vYXp1cmUvYmxvYnN0b3JhZ2VmaWxlc3lzdGVtLnB5)
 | `77.31% <0.00%> (ø)` | |
   | 
[...\_beam/testing/benchmarks/nexmark/queries/query6.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvcXVlcmllcy9xdWVyeTYucHk=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...s/snippets/transforms/aggregation/combineperkey.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9hZ2dyZWdhdGlvbi9jb21iaW5lcGVya2V5LnB5)
 | `98.91% <0.00%> (ø)` | |
   | ... and [443 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...3bed6b7](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   82.33%   
   =
 Files ?  453   
 Lines ?54054   
 Branches  ?0   
   =
 Hits  ?44506   
 Misses? 9548   
 Partials  ?0   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/coders/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL19faW5pdF9fLnB5)
 | `100.00% <0.00%> (ø)` | |
   | 
[...m/runners/portability/spark\_uber\_jar\_job\_server.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9zcGFya191YmVyX2phcl9qb2Jfc2VydmVyLnB5)
 | `85.60% <0.00%> (ø)` | |
   | 
[...am/examples/snippets/transforms/aggregation/sum.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9hZ2dyZWdhdGlvbi9zdW0ucHk=)
 | `100.00% <0.00%> (ø)` | |
   | 
[...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5)
 | `79.78% <0.00%> (ø)` | |
   | 
[...ache\_beam/io/gcp/datastore/v1new/query\_splitter.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2RhdGFzdG9yZS92MW5ldy9xdWVyeV9zcGxpdHRlci5weQ==)
 | `94.11% <0.00%> (ø)` | |
   | 
[...eam/portability/api/beam\_expansion\_api\_pb2\_grpc.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fZXhwYW5zaW9uX2FwaV9wYjJfZ3JwYy5weQ==)
 | `61.90% <0.00%> (ø)` | |
   | 
[...n/apache\_beam/typehints/typed\_pipeline\_test\_py3.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL3R5cGVkX3BpcGVsaW5lX3Rlc3RfcHkzLnB5)
 | `90.30% <0.00%> (ø)` | |
   | 
[...s/snippets/transforms/aggregation/combinevalues.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9hZ2dyZWdhdGlvbi9jb21iaW5ldmFsdWVzLnB5)
 | `94.73% <0.00%> (ø)` | |
   | 
[...examples/snippets/transforms/elementwise/values.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9lbGVtZW50d2lzZS92YWx1ZXMucHk=)
 | `100.00% <0.00%> (ø)` | |
   | ... and [443 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...3bed6b7](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #12885: [BEAM-7746] Add type checking to runners.pipeline_context

2020-09-21 Thread GitBox


robertwb merged pull request #12885:
URL: https://github.com/apache/beam/pull/12885


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12898: [BEAM-7372][BEAM-9980] Cleans up Flink precommit VR suite definition and makes Python version configurable.

2020-09-21 Thread GitBox


tvalentyn commented on pull request #12898:
URL: https://github.com/apache/beam/pull/12898#issuecomment-69655


   Run Seed Job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #12898: [BEAM-7372][BEAM-9980] Cleans up Flink precommit VR suite definition and makes Python version configurable.

2020-09-21 Thread GitBox


tvalentyn commented on a change in pull request #12898:
URL: https://github.com/apache/beam/pull/12898#discussion_r492476742



##
File path: sdks/python/test-suites/gradle.properties
##
@@ -27,3 +27,7 @@ dataflow_chicago_taxi_example_task_py_versions=3.7
 
 # direct test-suites
 direct_mongodbio_it_task_py_versions=3.5
+
+# portable test-suites
+portable_flink_validates_runner_py_versions=3.6

Review comment:
   Thanks, done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #12898: [BEAM-7372][BEAM-9980] Cleans up Flink precommit VR suite definition and makes Python version configurable.

2020-09-21 Thread GitBox


tvalentyn merged pull request #12898:
URL: https://github.com/apache/beam/pull/12898


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12645:
URL: https://github.com/apache/beam/pull/12645#issuecomment-688630083


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@2b2b8e7`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12645/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master   #12645   +/-   ##
   =
 Coverage  ?   34.47%   
   =
 Files ?  684   
 Lines ?81483   
 Branches  ? 9180   
   =
 Hits  ?28090   
 Misses?52972   
 Partials  ?  421   
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12645?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[io/textio\_test.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-aW8vdGV4dGlvX3Rlc3QucHk=)
 | `16.96% <0.00%> (ø)` | |
   | 
[io/gcp/bigquery\_io\_read\_it\_test.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-aW8vZ2NwL2JpZ3F1ZXJ5X2lvX3JlYWRfaXRfdGVzdC5weQ==)
 | `75.00% <0.00%> (ø)` | |
   | 
[...nners/portability/fn\_api\_runner/worker\_handlers.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-cnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL3dvcmtlcl9oYW5kbGVycy5weQ==)
 | `34.90% <0.00%> (ø)` | |
   | 
[examples/avro\_bitcoin.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-ZXhhbXBsZXMvYXZyb19iaXRjb2luLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[.../snippets/transforms/elementwise/partition\_test.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-ZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9lbGVtZW50d2lzZS9wYXJ0aXRpb25fdGVzdC5weQ==)
 | `46.66% <0.00%> (ø)` | |
   | 
[io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-aW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=)
 | `23.36% <0.00%> (ø)` | |
   | 
[ml/gcp/visionml.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-bWwvZ2NwL3Zpc2lvbm1sLnB5)
 | `47.61% <0.00%> (ø)` | |
   | 
[runners/interactive/pipeline\_analyzer.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-cnVubmVycy9pbnRlcmFjdGl2ZS9waXBlbGluZV9hbmFseXplci5weQ==)
 | `20.00% <0.00%> (ø)` | |
   | 
[transforms/trigger.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-dHJhbnNmb3Jtcy90cmlnZ2VyLnB5)
 | `37.84% <0.00%> (ø)` | |
   | 
[runners/common\_test.py](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree#diff-cnVubmVycy9jb21tb25fdGVzdC5weQ==)
 | `24.29% <0.00%> (ø)` | |
   | ... and [674 
more](https://codecov.io/gh/apache/beam/pull/12645/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12645?src=pr=footer). Last 
update 
[2b2b8e7...3bed6b7](https://codecov.io/gh/apache/beam/pull/12645?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #12881: [BEAM-7746] Get mypy passing on runners.worker

2020-09-21 Thread GitBox


robertwb commented on a change in pull request #12881:
URL: https://github.com/apache/beam/pull/12881#discussion_r492466948



##
File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
##
@@ -119,10 +125,10 @@ class RunnerIOOperation(operations.Operation):
 
   def __init__(self,
name_context,  # type: Union[str, common.NameContext]
-   step_name,
+   step_name,  # type: Any

Review comment:
   Is this not str or Optional[str]?

##
File path: sdks/python/apache_beam/runners/worker/data_plane.py
##
@@ -331,6 +369,7 @@ def add_to_inverse_output(timer):
 is_last=False))
 
 def close_stream(timer):
+  # type: (bytes) -> None

Review comment:
   I wonder if we should call this encoded_timer[s]? 

##
File path: sdks/python/apache_beam/coders/coder_impl.py
##
@@ -725,7 +726,7 @@ def __init__(self, key_coder_impl, window_coder_impl):
 self._tag_coder_impl = StrUtf8Coder().get_impl()
 
   def encode_to_stream(self, value, out, nested):
-# type: (dict, create_OutputStream, bool) -> None
+# type: (userstate.Timer, create_OutputStream, bool) -> None

Review comment:
   I think it used to be correct back when timers were being implemented. 
This code changed a couple of months ago too. 

##
File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
##
@@ -1070,23 +1078,24 @@ def delayed_bundle_application(self,
 return beam_fn_api_pb2.DelayedBundleApplication(
 requested_time_delay=proto_deferred_watermark,
 application=self.construct_bundle_application(
-op, current_watermark, element_and_restriction))
+op.input_info, current_watermark, element_and_restriction))
 
   def bundle_application(self,
  op,  # type: operations.DoOperation
  primary  # type: SplitResultPrimary
 ):
 # type: (...) -> beam_fn_api_pb2.BundleApplication
-return self.construct_bundle_application(op, None, primary.primary_value)
+assert op.input_info is not None
+return self.construct_bundle_application(
+op.input_info, None, primary.primary_value)
 
   def construct_bundle_application(self,
-   op,  # type: operations.DoOperation
+   op_input_info,  # type: 
operations.OpInputInfo

Review comment:
   Sounds good to me.

##
File path: sdks/python/apache_beam/runners/worker/data_plane.py
##
@@ -81,7 +85,11 @@
 
 class ClosableOutputStream(OutputStream):
   """A Outputstream for use with CoderImpls that has a close() method."""
-  def __init__(self, close_callback=None):
+  def __init__(
+  self,
+  close_callback=None  # type: Optional[Optional[Callable[[bytes], None]]]

Review comment:
   Why the double Optional (here and elsewhere below)?

##
File path: sdks/python/apache_beam/runners/worker/data_plane.py
##
@@ -218,7 +249,7 @@ class DataChannel(with_metaclass(abc.ABCMeta, object)):  # 
type: ignore[misc]
   @abc.abstractmethod
   def input_elements(self,
  instruction_id,  # type: str
- expected_inputs,  # type: Collection[str]
+ expected_inputs,  # type: Sized

Review comment:
   Don't we call `__contains__` as well? I'd rather keep it more fully 
typed. 

##
File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
##
@@ -947,11 +955,11 @@ def process_bundle(self, instruction_id):
   # (transform_id, timer_family_id).
   data_channels = collections.defaultdict(
   list
-  )  # type: DefaultDict[data_plane.GrpcClientDataChannel, List[str]]
+  )  # type: DefaultDict[data_plane.GrpcClientDataChannel, List[Union[str, 
Tuple[str, str

Review comment:
   Yep, this changed a couple of months ago. It'll be good to finally have 
these type annotations checked. 

##
File path: sdks/python/apache_beam/runners/worker/data_plane.py
##
@@ -243,7 +274,7 @@ def output_stream(
   instruction_id,  # type: str
   transform_id  # type: str
   ):
-# type: (...) -> ClosableOutputStream
+# type: (...) -> SizeBasedBufferingClosableOutputStream

Review comment:
   We're also thinking about adding a time-based one. 
   
   Let's add a no-op maybe_flush method to the baseclass and keep 
ClosableOutputStream everywhere. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] commented on pull request #12884: [BEAM-7746] Add type checking to coders

2020-09-21 Thread GitBox


codecov[bot] commented on pull request #12884:
URL: https://github.com/apache/beam/pull/12884#issuecomment-696510544


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12884?src=pr=h1) Report
   > Merging 
[#12884](https://codecov.io/gh/apache/beam/pull/12884?src=pr=desc) into 
[master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `83.33%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12884/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12884?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master   #12884  +/-   ##
   ==
   - Coverage   82.34%   82.33%   -0.02% 
   ==
 Files 452  453   +1 
 Lines   5401654058  +42 
   ==
   + Hits4448144509  +28 
   - Misses   9535 9549  +14 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12884?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/coders/coder\_impl.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVyX2ltcGwucHk=)
 | `95.13% <75.00%> (-0.12%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/coders/coders.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVycy5weQ==)
 | `85.41% <100.00%> (+0.02%)` | :arrow_up: |
   | 
[.../python/apache\_beam/transforms/periodicsequence.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9wZXJpb2RpY3NlcXVlbmNlLnB5)
 | `96.49% <0.00%> (-1.76%)` | :arrow_down: |
   | 
[...dks/python/apache\_beam/runners/pipeline\_context.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9waXBlbGluZV9jb250ZXh0LnB5)
 | `92.80% <0.00%> (-1.01%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5)
 | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=)
 | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | 
[...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==)
 | `88.98% <0.00%> (-0.36%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/io/iobase.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vaW9iYXNlLnB5)
 | `84.04% <0.00%> (-0.29%)` | :arrow_down: |
   | 
[...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==)
 | `94.45% <0.00%> (-0.14%)` | :arrow_down: |
   | 
[...python/apache\_beam/examples/wordcount\_dataframe.py](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvd29yZGNvdW50X2RhdGFmcmFtZS5weQ==)
 | `91.66% <0.00%> (ø)` | |
   | ... and [1 
more](https://codecov.io/gh/apache/beam/pull/12884/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12884?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12884?src=pr=footer). Last 
update 
[067cba8...e2952c5](https://codecov.io/gh/apache/beam/pull/12884?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492474752



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/RecordWithMetadata.java
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import com.google.auto.value.AutoValue;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import org.apache.beam.sdk.schemas.AutoValueSchema;
+import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
+
+/**
+ * Helper Class based on {@link AutoValueSchema}, it provides Metadata 
associated with each Record
+ * when reading from file(s) using {@link ContextualTextIO}.
+ *
+ * Fields:
+ *
+ * 
+ *   recordOffset: The offset of a record (the byte at which the record 
begins) in a file. This
+ *   information can be useful if you wish to reconstruct the file. {@link
+ *   RecordWithMetadata#getRecordOffset()}
+ *   recordNum: The ordinal number of the record in its file. {@link
+ *   RecordWithMetadata#getRecordNum()}
+ *   recordValue: The value / contents of the record {@link 
RecordWithMetadata#getValue()}
+ *   rangeOffset: The starting offset of the range (split), which 
contained the record, when the
+ *   record was read. {@link RecordWithMetadata#getRangeOffset()}
+ *   recordNumInOffset: The record number relative to the Range. (line 
number within the range)
+ *   {@link RecordWithMetadata#getRecordNumInOffset()}
+ *   fileName: Name of the file to which the record belongs (this is the 
full filename,
+ *   eg:path/to/file.txt) {@link RecordWithMetadata#getFileName()}
+ * 
+ */
+@Experimental(Experimental.Kind.SCHEMAS)
+@DefaultSchema(AutoValueSchema.class)
+@AutoValue
+public abstract class RecordWithMetadata {
+  public abstract long getRecordOffset();
+
+  public abstract long getRecordNum();
+
+  public abstract String getValue();
+
+  public abstract long getRangeOffset();
+
+  public abstract long getRecordNumInOffset();
+
+  public abstract Builder toBuilder();
+
+  public abstract String getFileName();
+
+  public static Builder newBuilder() {
+return new AutoValue_RecordWithMetadata.Builder();
+  }
+
+  @AutoValue.Builder
+  public abstract static class Builder {
+public abstract Builder setRecordNum(long lineNum);
+
+public abstract Builder setRecordOffset(long recordOffset);
+
+public abstract Builder setValue(String Value);

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492474814



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIO.java
##
@@ -0,0 +1,631 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import static org.apache.beam.sdk.io.FileIO.ReadMatches.DirectoryTreatment;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.CompressedSource;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileBasedSource;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.FileIO.MatchConfiguration;
+import org.apache.beam.sdk.io.ReadAllViaFileBasedSource;
+import org.apache.beam.sdk.io.TextIO;
+import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.schemas.NoSuchSchemaException;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.transforms.Count;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+import org.checkerframework.checker.nullness.qual.Nullable;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s that read text files and collect contextual information 
of the elements in
+ * the input.
+ *
+ * Use {@link TextIO} when not reading file with Multiline Records or 
additional metadata is not
+ * required.
+ *
+ * Reading from text files
+ *
+ * To read a {@link PCollection} from one or more text files, use {@code
+ * ContextualTextIO.read()}. To instantiate a transform use {@link
+ * ContextualTextIO.Read#from(String)} and specify the path of the file(s) to 
be read.
+ * Alternatively, if the filenames to be read are themselves in a {@link 
PCollection} you can use
+ * {@link FileIO} to match them and {@link ContextualTextIO#readFiles()} to 
read them.
+ *
+ * {@link #read} returns a {@link PCollection} of {@link RecordWithMetadata 
RecordWithMetadata},
+ * each corresponding to one line of an input UTF-8 text file (split into 
lines delimited by '\n',
+ * '\r', '\r\n', or specified delimiter see {@link 
ContextualTextIO.Read#withDelimiter})
+ *
+ * Filepattern expansion and watching
+ *
+ * By default, the filepatterns are expanded only once. The combination of 
{@link
+ * FileIO.Match#continuously(Duration, TerminationCondition)} and {@link 
#readFiles()} allow
+ * streaming of new files matching the filepattern(s).
+ *
+ * By default, {@link #read} prohibits filepatterns that match no files, 
and {@link #readFiles()}
+ * allows them in case the filepattern contains a glob wildcard character. Use 
{@link
+ * ContextualTextIO.Read#withEmptyMatchTreatment} or {@link
+ * FileIO.Match#withEmptyMatchTreatment(EmptyMatchTreatment)} plus {@link 
#readFiles()} to configure
+ * this behavior.
+ *
+ * Example 1: reading a file or filepattern.
+ *
+ * {@code
+ * Pipeline p = ...;

[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492475231



##
File path: sdks/java/io/contextual-text-io/build.gradle
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+plugins { id 'org.apache.beam.module' }
+applyJavaNature(
+automaticModuleName: 'org.apache.beam.sdk.io.contextual-text-io',
+enableChecker: false,
+ignoreRawtypeErrors: true)
+
+description = "Apache Beam :: SDKs :: Java :: Contextual-Text-IO"
+ext.summary = "Context-aware Text IO."
+
+dependencies {
+
+compile library.java.vendored_guava_26_0_jre
+compile library.java.protobuf_java
+compile project(path: ":sdks:java:core", configuration: "shadow")
+testCompile project(path: ":sdks:java:core", configuration: "shadowTest")
+
+testCompile library.java.guava_testlib
+testCompile library.java.junit
+testCompile library.java.hamcrest_core
+testRuntimeOnly library.java.slf4j_jdk14
+testCompile project(path: ":runners:direct-java", configuration: "shadow")
+
+}

Review comment:
   Done

##
File path: sdks/java/io/contextual-text-io/build.gradle
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+plugins { id 'org.apache.beam.module' }
+applyJavaNature(
+automaticModuleName: 'org.apache.beam.sdk.io.contextual-text-io',
+enableChecker: false,
+ignoreRawtypeErrors: true)
+
+description = "Apache Beam :: SDKs :: Java :: Contextual-Text-IO"
+ext.summary = "Context-aware Text IO."
+
+dependencies {
+

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #12884: [BEAM-7746] Add type checking to coders

2020-09-21 Thread GitBox


robertwb commented on pull request #12884:
URL: https://github.com/apache/beam/pull/12884#issuecomment-696504555


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492474704



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIOSource.java
##
@@ -0,0 +1,364 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.SeekableByteChannel;
+import java.util.NoSuchElementException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileBasedSource;
+import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
+import org.apache.beam.sdk.io.fs.MatchResult;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.schemas.NoSuchSchemaException;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.schemas.SchemaRegistry;
+import org.apache.beam.vendor.grpc.v1p26p0.com.google.protobuf.ByteString;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import org.checkerframework.checker.nullness.qual.Nullable;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Implementation detail of {@link ContextualTextIO.Read}.
+ *
+ * A {@link FileBasedSource} which can decode records delimited by newline 
characters.
+ *
+ * This source splits the data into records using {@code UTF-8} {@code \n}, 
{@code \r}, or {@code
+ * \r\n} as the delimiter. This source is not strict and supports decoding the 
last record even if
+ * it is not delimited. Finally, no records are decoded if the stream is empty.
+ *
+ * This source supports reading from any arbitrary byte position within the 
stream. If the
+ * starting position is not {@code 0}, then bytes are skipped until the first 
delimiter is found
+ * representing the beginning of the first record to be decoded.
+ */
+@VisibleForTesting
+class ContextualTextIOSource extends FileBasedSource {
+  byte[] delimiter;
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ContextualTextIOSource.class);
+
+  // Used to Override isSplittable
+  private boolean hasMultilineCSVRecords;
+
+  @Override
+  protected boolean isSplittable() throws Exception {
+if (hasMultilineCSVRecords) {
+  // When Having Multiline CSV Records,
+  // Splitting the file may cause a split to be within a record,
+  // Disabling split prevents this from happening
+  return false;
+}
+return super.isSplittable();
+  }
+
+  ContextualTextIOSource(
+  ValueProvider fileSpec,
+  EmptyMatchTreatment emptyMatchTreatment,
+  byte[] delimiter,
+  boolean hasMultilineCSVRecords) {
+super(fileSpec, emptyMatchTreatment, 1L);
+this.delimiter = delimiter;
+this.hasMultilineCSVRecords = hasMultilineCSVRecords;
+  }
+
+  private ContextualTextIOSource(
+  MatchResult.Metadata metadata,
+  long start,
+  long end,
+  byte[] delimiter,
+  boolean hasMultilineCSVRecords) {
+super(metadata, 1L, start, end);
+this.delimiter = delimiter;
+this.hasMultilineCSVRecords = hasMultilineCSVRecords;
+  }
+
+  @Override
+  protected FileBasedSource createForSubrangeOfFile(
+  MatchResult.Metadata metadata, long start, long end) {
+return new ContextualTextIOSource(metadata, start, end, delimiter, 
hasMultilineCSVRecords);
+  }
+
+  @Override
+  protected FileBasedReader 
createSingleFileReader(PipelineOptions options) {
+return new MultiLineTextBasedReader(this, delimiter, 
hasMultilineCSVRecords);
+  }
+
+  @Override
+  public Coder getOutputCoder() {
+SchemaCoder coder = null;
+try {
+  coder = 
SchemaRegistry.createDefault().getSchemaCoder(RecordWithMetadata.class);
+} catch (NoSuchSchemaException e) {
+  LOG.error("No Coder Found for RecordWithMetadata");
+}
+return coder;
+  }
+
+  /**
+   * A {@link FileBasedReader FileBasedReader} which can decode records 
delimited by delimiter
+   * 

[GitHub] [beam] abhiy13 commented on a change in pull request #12645: [BEAM-10124] Add ContextualTextIO

2020-09-21 Thread GitBox


abhiy13 commented on a change in pull request #12645:
URL: https://github.com/apache/beam/pull/12645#discussion_r492474778



##
File path: 
sdks/java/io/contextual-text-io/src/main/java/org/apache/beam/sdk/io/contextualtextio/ContextualTextIO.java
##
@@ -0,0 +1,631 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.contextualtextio;
+
+import static org.apache.beam.sdk.io.FileIO.ReadMatches.DirectoryTreatment;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.CompressedSource;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileBasedSource;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.FileIO.MatchConfiguration;
+import org.apache.beam.sdk.io.ReadAllViaFileBasedSource;
+import org.apache.beam.sdk.io.TextIO;
+import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.schemas.NoSuchSchemaException;
+import org.apache.beam.sdk.schemas.SchemaCoder;
+import org.apache.beam.sdk.transforms.Count;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.View;
+import org.apache.beam.sdk.transforms.Watch.Growth.TerminationCondition;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+import org.checkerframework.checker.nullness.qual.Nullable;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s that read text files and collect contextual information 
of the elements in
+ * the input.
+ *
+ * Use {@link TextIO} when not reading file with Multiline Records or 
additional metadata is not
+ * required.
+ *
+ * Reading from text files
+ *
+ * To read a {@link PCollection} from one or more text files, use {@code
+ * ContextualTextIO.read()}. To instantiate a transform use {@link
+ * ContextualTextIO.Read#from(String)} and specify the path of the file(s) to 
be read.
+ * Alternatively, if the filenames to be read are themselves in a {@link 
PCollection} you can use
+ * {@link FileIO} to match them and {@link ContextualTextIO#readFiles()} to 
read them.
+ *
+ * {@link #read} returns a {@link PCollection} of {@link RecordWithMetadata 
RecordWithMetadata},
+ * each corresponding to one line of an input UTF-8 text file (split into 
lines delimited by '\n',
+ * '\r', '\r\n', or specified delimiter see {@link 
ContextualTextIO.Read#withDelimiter})
+ *
+ * Filepattern expansion and watching
+ *
+ * By default, the filepatterns are expanded only once. The combination of 
{@link
+ * FileIO.Match#continuously(Duration, TerminationCondition)} and {@link 
#readFiles()} allow
+ * streaming of new files matching the filepattern(s).
+ *
+ * By default, {@link #read} prohibits filepatterns that match no files, 
and {@link #readFiles()}
+ * allows them in case the filepattern contains a glob wildcard character. Use 
{@link
+ * ContextualTextIO.Read#withEmptyMatchTreatment} or {@link
+ * FileIO.Match#withEmptyMatchTreatment(EmptyMatchTreatment)} plus {@link 
#readFiles()} to configure
+ * this behavior.
+ *
+ * Example 1: reading a file or filepattern.
+ *
+ * {@code
+ * Pipeline p = ...;

[GitHub] [beam] robertwb commented on a change in pull request #12884: [BEAM-7746] Add type checking to coders

2020-09-21 Thread GitBox


robertwb commented on a change in pull request #12884:
URL: https://github.com/apache/beam/pull/12884#discussion_r492465887



##
File path: sdks/python/apache_beam/coders/coder_impl.py
##
@@ -725,7 +728,7 @@ def __init__(self, key_coder_impl, window_coder_impl):
 self._tag_coder_impl = StrUtf8Coder().get_impl()
 
   def encode_to_stream(self, value, out, nested):
-# type: (dict, create_OutputStream, bool) -> None
+# type: (userstate.Timer, create_OutputStream, bool) -> None

Review comment:
   Hmm... these probably were right at the time. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #12888: [BEAM-10861]Moves PubSub Runner API encoding to Read/Write transforms

2020-09-21 Thread GitBox


chamikaramj commented on pull request #12888:
URL: https://github.com/apache/beam/pull/12888#issuecomment-696151452


   R: @boyuanzz @lukecwik 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #11459: [BEAM-2546] Add InfluxDbIO

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #11459:
URL: https://github.com/apache/beam/pull/11459#issuecomment-684976918


   # [Codecov](https://codecov.io/gh/apache/beam/pull/11459?src=pr=h1) Report
   > Merging 
[#11459](https://codecov.io/gh/apache/beam/pull/11459?src=pr=desc) into 
[master](https://codecov.io/gh/apache/beam/commit/d48a4d6294a0a5db5ceae3ba79376abe412d0103?el=desc)
 will **increase** coverage by `41.72%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/11459/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/11459?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ##   master   #11459   +/-   ##
   ===
   + Coverage   40.23%   81.95%   +41.72% 
   ===
 Files 455  457+2 
 Lines   5373154169  +438 
   ===
   + Hits2161944395+22776 
   + Misses  32112 9774-22338 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/11459?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/io/snowflake.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vc25vd2ZsYWtlLnB5)
 | `64.15% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=)
 | `100.00% <0.00%> (ø)` | |
   | 
[...apache\_beam/portability/api/beam\_runner\_api\_pb2.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fcnVubmVyX2FwaV9wYjIucHk=)
 | `100.00% <0.00%> (ø)` | |
   | 
[...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...e\_beam/portability/api/beam\_runner\_api\_pb2\_urns.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fcnVubmVyX2FwaV9wYjJfdXJucy5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ache\_beam/runners/interactive/pipeline\_analyzer.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9waXBlbGluZV9hbmFseXplci5weQ==)
 | | |
   | 
[...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...beam/testing/benchmarks/nexmark/queries/query10.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvcXVlcmllcy9xdWVyeTEwLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5)
 | `94.28% <0.00%> (ø)` | |
   | ... and [277 
more](https://codecov.io/gh/apache/beam/pull/11459/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/11459?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/11459?src=pr=footer). Last 
update 
[d48a4d6...fbd981c](https://codecov.io/gh/apache/beam/pull/11459?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on a change in pull request #12611: [BEAM-10139][BEAM-10140] Add cross-language support for Java SpannerIO with python wrapper

2020-09-21 Thread GitBox


chamikaramj commented on a change in pull request #12611:
URL: https://github.com/apache/beam/pull/12611#discussion_r492364337



##
File path: sdks/python/apache_beam/io/gcp/spanner.py
##
@@ -0,0 +1,504 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""PTransforms for supporting Spanner in Python pipelines.
+
+  These transforms are currently supported by Beam portable
+  Flink and Spark runners.
+
+  **Setup**
+
+  Transforms provided in this module are cross-language transforms
+  implemented in the Beam Java SDK. During the pipeline construction, Python 
SDK
+  will connect to a Java expansion service to expand these transforms.
+  To facilitate this, a small amount of setup is needed before using these
+  transforms in a Beam Python pipeline.
+
+  There are several ways to setup cross-language Spanner transforms.
+
+  * Option 1: use the default expansion service
+  * Option 2: specify a custom expansion service
+
+  See below for details regarding each of these options.
+
+  *Option 1: Use the default expansion service*
+
+  This is the recommended and easiest setup option for using Python Spanner
+  transforms. This option is only available for Beam 2.25.0 and later.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Install Java runtime in the computer from where the pipeline is constructed
+and make sure that 'java' command is available.
+
+  In this option, Python SDK will either download (for released Beam version) 
or
+  build (when running from a Beam Git clone) a expansion service jar and use
+  that to expand transforms. Currently Spanner transforms use the
+  'beam-sdks-java-io-google-cloud-platform-expansion-service' jar for this
+  purpose.
+
+  *Option 2: specify a custom expansion service*
+
+  In this option, you startup your own expansion service and provide that as
+  a parameter when using the transforms provided in this module.
+
+  This option requires following pre-requisites before running the Beam
+  pipeline.
+
+  * Startup your own expansion service.
+  * Update your pipeline to provide the expansion service address when
+initiating Spanner transforms provided in this module.
+
+  Flink Users can use the built-in Expansion Service of the Flink Runner's
+  Job Server. If you start Flink's Job Server, the expansion service will be
+  started on port 8097. For a different address, please set the
+  expansion_service parameter.
+
+  **More information**
+
+  For more information regarding cross-language transforms see:
+  - https://beam.apache.org/roadmap/portability/
+
+  For more information specific to Flink runner see:
+  - https://beam.apache.org/documentation/runners/flink/
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import typing
+import uuid
+from typing import List
+from typing import NamedTuple
+from typing import Optional
+
+from past.builtins import unicode
+
+from apache_beam import coders
+from apache_beam.transforms.external import BeamJarExpansionService
+from apache_beam.transforms.external import ExternalTransform
+from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = [
+'WriteToSpanner',
+'ReadFromSpanner',
+'MutationCreator',
+'TimestampBoundMode',
+'TimeUnit',
+]
+
+
+def default_io_expansion_service():
+  return BeamJarExpansionService(
+  'sdks:java:io:google-cloud-platform:expansion-service:shadowJar')
+
+
+WriteToSpannerSchema = typing.NamedTuple(
+'WriteToSpannerSchema',
+[
+('instance_id', unicode),
+('database_id', unicode),
+('project_id', Optional[unicode]),
+('batch_size_bytes', Optional[int]),
+('max_num_mutations', Optional[int]),
+('max_num_rows', Optional[int]),
+('grouping_factor', Optional[int]),
+('host', Optional[unicode]),
+('emulator_host', Optional[unicode]),
+('commit_deadline', Optional[int]),
+('max_cumulative_backoff', Optional[int]),
+],
+)
+
+
+class WriteToSpanner(ExternalTransform):

Review comment:
   Probably it makes sense to converge into one implementation. I'd 

[GitHub] [beam] lukecwik commented on pull request #12794: [BEAM-10865] Support for Kafka deserialization API with headers (since Kafka API 2.1.0)

2020-09-21 Thread GitBox


lukecwik commented on pull request #12794:
URL: https://github.com/apache/beam/pull/12794#issuecomment-696393557


   > Thanks for the feedback and the merge Luke. I'll address the updates in a 
PR.
   > 
   > I agree all tests makes more sense as one can never have enough coverage. 
That said, I'm been thinking about the `ConsumerSpEL` and the fact the in the 
test suite it until now only executed against Kafka clients API 1.0.0 
(`library.java.kafka_clients`) for each run. It's possible to also have the 
tasks and configurations dynamically created for a range of Kafka API versions 
too now that a pattern emerged, but that also would add quite a lot of time to 
test runs.
   > 
   > Thoughts?
   
   I think that is a good idea to enumerate the most popular kafka client 
versions. I don't think the unit tests are that slow and the community can 
always break-up the set that run to use more jenkins executors in parallel if 
it becomes a large enough issue. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on pull request #12800: [BEAM-10603] Add the ib.recordings API

2020-09-21 Thread GitBox


rohdesamuel commented on pull request #12800:
URL: https://github.com/apache/beam/pull/12800#issuecomment-696374567







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #12576: [BEAM-10671] Add environment configuration fields as first-class pipeline options.

2020-09-21 Thread GitBox


ibzib commented on pull request #12576:
URL: https://github.com/apache/beam/pull/12576#issuecomment-696283965


   More flakes: BEAM-10866



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] yukihira1992 commented on pull request #12887: [BEAM-10935] Replace @abstractproperty with @abstractmethod and @prop…

2020-09-21 Thread GitBox


yukihira1992 commented on pull request #12887:
URL: https://github.com/apache/beam/pull/12887#issuecomment-696125767


   R: @robertwb



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] CraigChambersG commented on pull request #12880: [BEAM-10933] Adjust GBK and Flatten types before creating the pipeline proto

2020-09-21 Thread GitBox


CraigChambersG commented on pull request #12880:
URL: https://github.com/apache/beam/pull/12880#issuecomment-696219024


   Yes, that should be the case, on v2.  Dataflow v2 can handle
   heterogeneously typed Flattens.
   
   On Mon, Sep 21, 2020 at 9:10 AM Lukasz Cwik 
   wrote:
   
   > *@lukecwik* commented on this pull request.
   > --
   >
   > In sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
   > :
   >
   > > @@ -430,6 +430,15 @@ def _check_for_unsupported_fnapi_features(self, 
pipeline_proto):
   >  
components.coders[windowing_strategy.window_coder_id].spec.urn,
   >  windowing_strategy.window_fn.urn))
   >
   > +  def _adjust_types_for_dataflow(self, pipeline):
   > +# Dataflow runner requires a KV type for GBK inputs, hence we enforce 
that
   > +# here.
   > +pipeline.visit(self.group_by_key_input_visitor())
   > +
   > +# Dataflow runner requires output type of the Flatten to be the same 
as the
   >
   > @CraigChambersG  I was under the
   > impression that this wasn't true and that Dataflow v2 handled this.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > ,
   > or unsubscribe
   > 

   > .
   >
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #11856: [BEAM-7505] SideInput Python Load Test job

2020-09-21 Thread GitBox


kamilwu commented on pull request #11856:
URL: https://github.com/apache/beam/pull/11856#issuecomment-696212607


   Thanks @boyuanzz, it makes sense now why it didn't work. Unfortunately, I 
can't use V2 runner, because I want to run these tests in batch mode as well 
(V2 runner supports streaming only)
   
   @tysonjh Can we move forward with the first six tests (with one, global 
window) now and skip those with 1000 windows? We could wait until batch runner 
supports SDF fully or until we come up with other idea. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #12882: [BEAM-10814] DataframeTransform outputs elements

2020-09-21 Thread GitBox


TheNeuralBit commented on a change in pull request #12882:
URL: https://github.com/apache/beam/pull/12882#discussion_r492219317



##
File path: sdks/python/apache_beam/dataframe/schemas.py
##
@@ -55,17 +159,149 @@ def expand(self, pcoll):
 lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-return pd.Series(name=name, dtype=typ)
-  except TypeError:
-raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-  name: _make_empty_series(name, typ)
-  for name,
-  typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+  {name: _make_proxy_series(name, typehint)
+   for name, typehint in fields},
+  columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
   I thought the MultiIndex or named case was important since otherwise 
we'll drop the grouped column(s) when unbatching the result of a grouped 
aggregation.
   
   It raise some tricky issues though:
   - Index names are not required to be unique.
   - It looks like my assumption that all MultiIndexes are named is wrong. It's 
possible to create a `MultiIndex` with `names=[None, None, 'foo']`, which would 
break this badly.
   - Type information is not necessarily preserved in indexes. e.g. Int64Index 
doesn't support nulls like Series with Int64Dtype does. if one is added it's 
converted to a Float64Index with nans.
   
   Maybe including the index shouldn't be the default until we have a better 
handle on these edge cases.

##
File path: sdks/python/apache_beam/dataframe/schemas.py
##
@@ -55,17 +159,149 @@ def expand(self, pcoll):
 lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-return pd.Series(name=name, dtype=typ)
-  except TypeError:
-raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 
 
 def generate_proxy(element_type):
   # type: (type) -> pd.DataFrame
-  return pd.DataFrame({
-  name: _make_empty_series(name, typ)
-  for name,
-  typ in named_fields_from_element_type(element_type)
-  })
+
+  """ Generate a proxy pandas object for the given PCollection element_type.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  fields = named_fields_from_element_type(element_type)
+  return pd.DataFrame(
+  {name: _make_proxy_series(name, typehint)
+   for name, typehint in fields},
+  columns=[name for name, _ in fields])
+
+
+def element_type_from_proxy(proxy):
+  # type: (pd.DataFrame) -> type
+
+  """ Generate an element_type for an element-wise PCollection from a proxy
+  pandas object. Currently only supports converting the element_type for
+  a schema-aware PCollection to a proxy DataFrame.
+
+  Currently only supports generating a DataFrame proxy from a schema-aware
+  PCollection."""
+  indices = [] if proxy.index.names == (None, ) else [

Review comment:
   We could log a warning if there's a named index in the result and 
`include_indexes` is `False`

##
File path: sdks/python/apache_beam/dataframe/schemas.py
##
@@ -55,17 +159,149 @@ def expand(self, pcoll):
 lambda batch: pd.DataFrame.from_records(batch, columns=columns))
 
 
-def _make_empty_series(name, typ):
-  try:
-return pd.Series(name=name, dtype=typ)
-  except TypeError:
-raise TypeError("Unable to convert type '%s' for field '%s'" % (name, typ))
+def _make_proxy_series(name, typehint):
+  # Default to np.object. This is lossy, we won't be able to recover the type
+  # at the output.
+  dtype = BEAM_TO_PANDAS.get(typehint, np.object)
+
+  return pd.Series(name=name, dtype=dtype)
 

[GitHub] [beam] lukecwik commented on pull request #12367: [BEAM-10564] Support more Avro field name formats when mapping to Jav…

2020-09-21 Thread GitBox


lukecwik commented on pull request #12367:
URL: https://github.com/apache/beam/pull/12367#issuecomment-696223362


   We have a merge on green policy so I have been rerunning the tests to get 
past a known issue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aromanenko-dev commented on pull request #11459: [BEAM-2546] Add InfluxDbIO

2020-09-21 Thread GitBox


aromanenko-dev commented on pull request #11459:
URL: https://github.com/apache/beam/pull/11459#issuecomment-696273839


   The only minor thing that is missing -  update of `CHANGES.md` about this 
new IO.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #9907: [BEAM-4091] Pass type hints in ptransform_fn

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #9907:
URL: https://github.com/apache/beam/pull/9907#issuecomment-682206351







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem merged pull request #12721: [BEAM-10871] Add deidentify for FhirIO connector

2020-09-21 Thread GitBox


pabloem merged pull request #12721:
URL: https://github.com/apache/beam/pull/12721


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] trucleduc commented on pull request #12902: [BEAM-10871] Fix FhirLROIT tests

2020-09-21 Thread GitBox


trucleduc commented on pull request #12902:
URL: https://github.com/apache/beam/pull/12902#issuecomment-696417081







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12853: [BEAM-3083] Sets sdk_harness_container_images property for all Dataflow Runner V2 jobs

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12853:
URL: https://github.com/apache/beam/pull/12853#issuecomment-693256374







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] timclemons removed a comment on pull request #12850: [BEAM-10481] Ensure registration of the accumulator occurs.

2020-09-21 Thread GitBox


timclemons removed a comment on pull request #12850:
URL: https://github.com/apache/beam/pull/12850#issuecomment-696380555


   R: @aromanenko-dev 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12880: [BEAM-10933] Adjust GBK and Flatten types before creating the pipeline proto

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12880:
URL: https://github.com/apache/beam/pull/12880#issuecomment-695346831







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chadrik commented on pull request #12883: [BEAM-7746] Add more typing to metrics

2020-09-21 Thread GitBox


chadrik commented on pull request #12883:
URL: https://github.com/apache/beam/pull/12883#issuecomment-695814821


   R: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #12891: [BEAM-7372][BEAM-9372] Removes Python 2 and Python 3.5 Postcommit jobs.

2020-09-21 Thread GitBox


tvalentyn merged pull request #12891:
URL: https://github.com/apache/beam/pull/12891


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on a change in pull request #12806: [BEAM-10869] Make WriteToPubsub output serialized PubsubMessage proto bytes when using runner v2

2020-09-21 Thread GitBox


chamikaramj commented on a change in pull request #12806:
URL: https://github.com/apache/beam/pull/12806#discussion_r492091684



##
File path: sdks/python/apache_beam/io/gcp/pubsub.py
##
@@ -290,21 +290,26 @@ def __init__(self,
 topic, id_label, with_attributes, timestamp_attribute)
 
   @staticmethod
-  def to_proto_str(element):
+  def message_to_proto_str(element):
 # type: (PubsubMessage) -> bytes
 if not isinstance(element, PubsubMessage):
   raise TypeError(
   'Unexpected element. Type: %s (expected: PubsubMessage), '
   'value: %r' % (type(element), element))
 return element._to_proto_str()
 
+  @staticmethod
+  def bytes_to_proto_str(element):
+# type: (bytes) -> bytes
+msg = pubsub.types.pubsub_pb2.PubsubMessage()
+msg.data = element
+return msg.SerializeToString()
+
   def expand(self, pcoll):
 if self.with_attributes:

Review comment:
   Sent https://github.com/apache/beam/pull/12888





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn merged pull request #12886: [BEAM-9154] Disable Chicago Taxi Example on Jenkins

2020-09-21 Thread GitBox


tvalentyn merged pull request #12886:
URL: https://github.com/apache/beam/pull/12886


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #12727: [BEAM-10844] Add experiment option prebuild_sdk_container to prebuild python sdk container with dependencies.

2020-09-21 Thread GitBox


robertwb commented on a change in pull request #12727:
URL: https://github.com/apache/beam/pull/12727#discussion_r492191803



##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
##
@@ -473,10 +474,19 @@ def run_pipeline(self, pipeline, options):
 
 use_fnapi = apiclient._use_fnapi(options)
 from apache_beam.transforms import environments
-self._default_environment = (
-environments.DockerEnvironment.from_container_image(
-apiclient.get_container_image_from_options(options),
-artifacts=environments.python_sdk_dependencies(options)))
+if options.view_as(DebugOptions).lookup_experiment(
+'prebuild_sdk_container'):

Review comment:
   +1 to being consistent. I would lean towards both being top-level 
options. 

##
File path: sdks/python/apache_beam/runners/portability/stager.py
##
@@ -119,6 +119,7 @@ def create_job_resources(options,  # type: PipelineOptions
temp_dir,  # type: str
build_setup_args=None,  # type: Optional[List[str]]
populate_requirements_cache=None,  # type: 
Optional[str]
+   skip_boot_dependencies=False, # type: Optional[bool]

Review comment:
   Why are "boot" dependencies handled any differently than non-"boot" 
dependencies? 

##
File path: sdks/python/apache_beam/runners/portability/sdk_container_builder.py
##
@@ -0,0 +1,275 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""SdkContainerBuilder builds the portable SDK container with dependencies.
+
+It copies the right boot dependencies, namely: apache beam sdk, python packages
+from requirements.txt, python packages from extra_packages.txt, workflow
+tarball, into the latest public python sdk container image, and run the
+dependencies installation in advance with the boot program in setup only mode
+to build the new image.
+"""
+
+from __future__ import absolute_import
+
+import json
+import logging
+import os
+import shutil
+import subprocess
+import sys
+import tarfile
+import tempfile
+import time
+import uuid
+
+from google.protobuf.duration_pb2 import Duration
+from google.protobuf.json_format import MessageToJson
+
+from apache_beam.internal.gcp.auth import get_service_credentials
+from apache_beam.internal.http_client import get_new_http
+from apache_beam.io.gcp.internal.clients import storage
+from apache_beam.options.pipeline_options import DebugOptions
+from apache_beam.options.pipeline_options import GoogleCloudOptions
+from apache_beam.options.pipeline_options import PipelineOptions  # pylint: 
disable=unused-import
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.runners.portability.stager import Stager
+
+ARTIFACTS_CONTAINER_DIR = '/opt/apache/beam/artifacts'
+ARTIFACTS_MANIFEST_FILE = 'artifacts_info.json'
+SDK_CONTAINER_ENTRYPOINT = '/opt/apache/beam/boot'
+DOCKERFILE_TEMPLATE = (
+"""FROM apache/beam_python{major}.{minor}_sdk:latest
+RUN mkdir -p {workdir}
+COPY ./* {workdir}/
+RUN {entrypoint} --setup_only --artifacts {workdir}/{manifest_file}
+""")
+
+SOURCE_FOLDER = 'source'
+_LOGGER = logging.getLogger(__name__)
+
+
+class SdkContainerBuilder(object):
+  def __init__(self, options):
+self._options = options
+self._temp_src_dir = tempfile.mkdtemp()
+self._docker_registry_push_url = self._options.view_as(
+DebugOptions).lookup_experiment('docker_registry_push_url')
+
+  def build(self):
+container_id = str(uuid.uuid4())
+container_tag = os.path.join(
+self._docker_registry_push_url or '',
+'beam_python_prebuilt_sdk:%s' % container_id)
+self.prepare_dependencies()
+self.invoke_docker_build_and_push(container_id, container_tag)
+
+return container_tag
+
+  def prepare_dependencies(self):
+tmp = tempfile.mkdtemp()
+resources = Stager.create_job_resources(self._options, tmp)
+# make a copy of the staged artifacts into the temp source folder.
+for path, name in resources:
+  shutil.copyfile(path, os.path.join(self._temp_src_dir, name))
+with open(os.path.join(self._temp_src_dir, 

[GitHub] [beam] robinyqiu commented on a change in pull request #12843: [BEAM-10895] Support UNNEST an (possibly nested) array field of an struct column

2020-09-21 Thread GitBox


robinyqiu commented on a change in pull request #12843:
URL: https://github.com/apache/beam/pull/12843#discussion_r492322185



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnnestRel.java
##
@@ -111,31 +112,41 @@ public RelWriter explainTerms(RelWriter pw) {
   Schema joinedSchema = CalciteUtils.toSchema(getRowType());
 
   return outer
-  .apply(ParDo.of(new UnnestFn(joinedSchema, unnestIndex)))
+  .apply(ParDo.of(new UnnestFn(joinedSchema, unnestIndices)))
   .setRowSchema(joinedSchema);
 }
   }
 
   private static class UnnestFn extends DoFn {
 
 private final Schema outputSchema;
-private final int unnestIndex;
+private final List unnestIndices;
 
-private UnnestFn(Schema outputSchema, int unnestIndex) {
+private UnnestFn(Schema outputSchema, List unnestIndices) {
   this.outputSchema = outputSchema;
-  this.unnestIndex = unnestIndex;
+  this.unnestIndices = unnestIndices;
 }
 
 @ProcessElement
 public void process(@Element Row row, OutputReceiver out) {
-
-  @Nullable Collection rawValues = row.getArray(unnestIndex);
+  Row rowWithArrayField = row;
+  Schema schemaWithArrayField = outputSchema;
+  for (int i = unnestIndices.size() - 1; i > 0; i--) {
+rowWithArrayField = rowWithArrayField.getRow(unnestIndices.get(i));

Review comment:
   I believe the updated code in `BeamUnnestRule.java` will guarantee this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on a change in pull request #12794: [BEAM-10865] Support for Kafka deserialization API with headers (since Kafka API 2.1.0)

2020-09-21 Thread GitBox


lukecwik commented on a change in pull request #12794:
URL: https://github.com/apache/beam/pull/12794#discussion_r492248577



##
File path: sdks/java/io/kafka/build.gradle
##
@@ -64,4 +69,24 @@ dependencies {
   testCompile library.java.powermock_mockito
   testRuntimeOnly library.java.slf4j_jdk14
   testRuntimeOnly project(path: ":runners:direct-java", configuration: 
"shadow")
+  kafkaVersion210 "org.apache.kafka:kafka-clients:2.1.0"
+}
+
+configurations.kafkaVersion210 {
+  resolutionStrategy {
+force "org.apache.kafka:kafka-clients:2.1.0"
+  }
+}
+
+task kafkaVersion210Test(type: Test) {
+  group = "Verification"
+  description = 'Runs KafkaIO tests with Kafka clients API 2.1.0'
+  outputs.upToDateWhen { false }
+  testClassesDirs = sourceSets.test.output.classesDirs
+  classpath =  configurations.kafkaVersion210 + 
sourceSets.test.runtimeClasspath
+  include '**/KafkaIOTest.class'

Review comment:
   Can we run all the unit tests?
   ```suggestion
 include '**/KafkaIOTest.class'
   ```

##
File path: sdks/java/io/kafka/build.gradle
##
@@ -64,4 +69,24 @@ dependencies {
   testCompile library.java.powermock_mockito
   testRuntimeOnly library.java.slf4j_jdk14
   testRuntimeOnly project(path: ":runners:direct-java", configuration: 
"shadow")
+  kafkaVersion210 "org.apache.kafka:kafka-clients:2.1.0"
+}
+
+configurations.kafkaVersion210 {
+  resolutionStrategy {
+force "org.apache.kafka:kafka-clients:2.1.0"
+  }
+}
+
+task kafkaVersion210Test(type: Test) {
+  group = "Verification"
+  description = 'Runs KafkaIO tests with Kafka clients API 2.1.0'
+  outputs.upToDateWhen { false }

Review comment:
   Instead of doing `outputs.upToDateWhen { false }` to have this run every 
time, please include the set of inputs/outputs that makes sense in a follow-up 
PR so that an incremental build can be supported.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #12880: [BEAM-10933] Adjust GBK and Flatten types before creating the pipeline proto

2020-09-21 Thread GitBox


lukecwik commented on pull request #12880:
URL: https://github.com/apache/beam/pull/12880#issuecomment-696241764


   Yeah, the GBK one makes sense since it needs to say its a pair coder.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik removed a comment on pull request #12367: [BEAM-10564] Support more Avro field name formats when mapping to Jav…

2020-09-21 Thread GitBox


lukecwik removed a comment on pull request #12367:
URL: https://github.com/apache/beam/pull/12367#issuecomment-696223362


   We have a merge on green policy so I have been rerunning the tests to get 
past a known issue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on pull request #12870: [BEAM-2146] Continuously test Dataflow ValidatesRunner forcing streaming

2020-09-21 Thread GitBox


kennknowles commented on pull request #12870:
URL: https://github.com/apache/beam/pull/12870#issuecomment-696343825


   run dataflow validatesrunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #12762: Ensuring that BigQuery jobs are tagged with the Dataflow step that launches them

2020-09-21 Thread GitBox


pabloem commented on a change in pull request #12762:
URL: https://github.com/apache/beam/pull/12762#discussion_r492332731



##
File path: sdks/python/apache_beam/io/gcp/bigquery_io_metadata.py
##
@@ -28,6 +28,11 @@
 _VALID_CLOUD_LABEL_PATTERN = re.compile(r'^[a-z0-9\_\-]{1,63}$')
 
 
+def _sanitize_value(value):
+  """Sanitizes a value into a valid BigQuery label value."""
+  return re.sub('[^\w-]+', '', value.lower().replace('/', '-'))[0:63]

Review comment:
   This is in bigquery_io_metadata. Thanks!

##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1875,14 +1895,22 @@ def process(self, unused_element, signal):
 temp_location = pcoll.pipeline.options.view_as(
 GoogleCloudOptions).temp_location
 gcs_location = self._get_destination_uri(temp_location)
+job_name = pcoll.pipeline.options.view_as(GoogleCloudOptions).job_name
 
+try:
+  step_name = self.label

Review comment:
   This is set by the Python SDK as it expands transforms. It's definitely 
not orthodox to rely on it, and a little unusual.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] alanmyrvold merged pull request #12879: [BEAM-9136] Add CSV for python dependency license names, URLs and types

2020-09-21 Thread GitBox


alanmyrvold merged pull request #12879:
URL: https://github.com/apache/beam/pull/12879


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] commented on pull request #12562: [BEAM-10200] Respect profile_memory option and add memory profiler to…

2020-09-21 Thread GitBox


codecov[bot] commented on pull request #12562:
URL: https://github.com/apache/beam/pull/12562#issuecomment-696271535


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12562?src=pr=h1) Report
   > :exclamation: No coverage uploaded for pull request head 
(`BEAM-10200-2@30fae7a`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-head-commit).
   > The diff coverage is `n/a`.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12721: [BEAM-10871] Add deidentify for FhirIO connector

2020-09-21 Thread GitBox


TheNeuralBit commented on pull request #12721:
URL: https://github.com/apache/beam/pull/12721#issuecomment-696407136


   I think this broke Fhir integration tests in PostCommit: 
https://ci-beam.apache.org/job/beam_PostCommit_Java/6624/



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim commented on pull request #9907: [BEAM-4091] Pass type hints in ptransform_fn

2020-09-21 Thread GitBox


udim commented on pull request #9907:
URL: https://github.com/apache/beam/pull/9907#issuecomment-696465531


   PTAL, changes are in the last commit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] timclemons commented on pull request #12850: [BEAM-10481] Ensure registration of the accumulator occurs.

2020-09-21 Thread GitBox


timclemons commented on pull request #12850:
URL: https://github.com/apache/beam/pull/12850#issuecomment-696380555


   R: @aromanenko-dev 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12873: Remove experimental declarations from fileio.

2020-09-21 Thread GitBox


pabloem commented on pull request #12873:
URL: https://github.com/apache/beam/pull/12873#issuecomment-696322183


   Can we keep WriteToFiles as experimental? I would like to spend some time 
figuring out a couple questions about how it handles sharding before fully 
marking it as stable. I filed https://issues.apache.org/jira/browse/BEAM-10939 
to track this.
   
   Except for that comment, everything else LGTM. Thanks @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #12888: [BEAM-10861]Moves PubSub Runner API encoding to Read/Write transforms

2020-09-21 Thread GitBox


boyuanzz commented on pull request #12888:
URL: https://github.com/apache/beam/pull/12888#issuecomment-696410670


   Discussed with Luke and Cham separately, we can get ride of 
`with_attributes` and `serialized_attribute_fn ` from both Read and Write. 
There are corresponding changes for Write: 
https://github.com/apache/beam/pull/12806. We should also similar changes to 
Read.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on pull request #12799: [BEAM-10603] Add record_pipeline, clear to RM and fix duration limiter

2020-09-21 Thread GitBox


rohdesamuel commented on pull request #12799:
URL: https://github.com/apache/beam/pull/12799#issuecomment-696247296







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12896: Update indexing skips for pandas 1.x

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12896:
URL: https://github.com/apache/beam/pull/12896#issuecomment-696398122







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] angoenka commented on a change in pull request #12562: [BEAM-10200] Respect profile_memory option and add memory profiler to…

2020-09-21 Thread GitBox


angoenka commented on a change in pull request #12562:
URL: https://github.com/apache/beam/pull/12562#discussion_r491759030



##
File path: sdks/python/apache_beam/utils/profiler.py
##
@@ -44,59 +42,91 @@
 
 
 class Profile(object):
-  """cProfile wrapper context for saving and logging profiler results."""
+  """cProfile and Heapy wrapper context for saving and logging profiler
+  results."""
 
   SORTBY = 'cumulative'
 
   def __init__(
   self,
-  profile_id,
-  profile_location=None,
-  log_results=False,
-  file_copy_fn=None,
-  time_prefix='%Y-%m-%d_%H_%M_%S-'):
+  profile_id, # type: str
+  profile_location=None, # type: Optional[str]
+  log_results=False, # type: bool
+  file_copy_fn=None, # type: Optional[Callable[[str, str], None]]
+  time_prefix='%Y-%m-%d_%H_%M_%S-', # type: str
+  enable_cpu_profiling=False, # type: bool
+  enable_memory_profiling=False, # type: bool
+  ):
+"""Creates a Profile object.
+
+Args:
+  profile_id: Unique id of the profiling session.
+  profile_location: The file location where the profiling results will be
+stored.
+  log_results: Log the result to console if true.
+  file_copy_fn: Lambda function for copying files.
+  time_prefix: Format of the timestamp prefix in profiling result files.
+  enable_cpu_profiling: CPU profiler will be enabled during the profiling
+session.
+  enable_memory_profiling: Memory profiler will be enabled during the
+profiling session, the profiler only records the newly allocated 
objects
+in this session.
+"""
 self.stats = None
 self.profile_id = str(profile_id)
 self.profile_location = profile_location
 self.log_results = log_results
 self.file_copy_fn = file_copy_fn or self.default_file_copy_fn
 self.time_prefix = time_prefix
 self.profile_output = None
+self.enable_cpu_profiling = enable_cpu_profiling
+self.enable_memory_profiling = enable_memory_profiling
 
   def __enter__(self):
 _LOGGER.info('Start profiling: %s', self.profile_id)
-self.profile = cProfile.Profile()
-self.profile.enable()
+if self.enable_cpu_profiling:
+  self.profile = cProfile.Profile()
+  self.profile.enable()
+if self.enable_memory_profiling:
+  try:
+from guppy import hpy
+self.hpy = hpy()
+self.hpy.setrelheap()
+  except ImportError:

Review comment:
   Let's log the import failure





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12895: [BEAM-9372][BEAM-9980] Switches Flink VR suite to Py36 and makes the version configurable.

2020-09-21 Thread GitBox


tvalentyn commented on pull request #12895:
URL: https://github.com/apache/beam/pull/12895#issuecomment-696339415







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11856: [BEAM-7505] SideInput Python Load Test job

2020-09-21 Thread GitBox


lukecwik commented on pull request #11856:
URL: https://github.com/apache/beam/pull/11856#issuecomment-696392541


   > Thanks @boyuanzz, it makes sense now why it didn't work. Unfortunately, I 
can't use V2 runner, because I want to run these tests in batch mode as well 
(V2 runner supports streaming only)
   > 
   > @tysonjh Can we move forward with the first six tests (with one, global 
window) now and skip those with 1000 windows? We could wait until batch runner 
supports SDF fully or until we come up with other idea.
   
   V2 runner supports batch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] caller9 commented on pull request #12692: [BEAM-10819] Upgrade Gradle to 6.1.1

2020-09-21 Thread GitBox


caller9 commented on pull request #12692:
URL: https://github.com/apache/beam/pull/12692#issuecomment-696301948


   Obsolete after #12776 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #12637: [BEAM-10768] Don't assert the order in which elements are received.

2020-09-21 Thread GitBox


ibzib commented on a change in pull request #12637:
URL: https://github.com/apache/beam/pull/12637#discussion_r492386778



##
File path: sdks/python/apache_beam/runners/worker/data_plane_test.py
##
@@ -108,16 +106,28 @@ def send(instruction_id, transform_id, data):
 ])
 
 # Multiple interleaved writes to multiple instructions.
-send('1', transform_1, b'abc')
-send('2', transform_1, b'def')
+stream11 = from_channel.output_stream('1', transform_1)
+stream11.write(b'abc')
+stream21 = from_channel.output_stream('2', transform_1)
+stream21.write(b'def')
+if not time_based_flush:
+  stream11.close()
 self.assertEqual(
 list(
 itertools.islice(to_channel.input_elements('1', [transform_1]), 
1)),
 [
 beam_fn_api_pb2.Elements.Data(
 instruction_id='1', transform_id=transform_1, data=b'abc')
 ])
-send('2', transform_2, b'ghi')
+if time_based_flush:

Review comment:
   Write does not provide ordering guarantees in this case.
   
   Elements are stored in a 
[queue](https://github.com/apache/beam/blob/7b3d4251d244c10545fb37f1d93ebcad84a98681/sdks/python/apache_beam/runners/worker/data_plane.py#L371)
 before being sent, to enable batching. Elements aren't added to that queue 
until the [flush 
callback](https://github.com/apache/beam/blob/7b3d4251d244c10545fb37f1d93ebcad84a98681/sdks/python/apache_beam/runners/worker/data_plane.py#L493)
 is invoked. Because the flush callback is [invoked 
periodically](https://github.com/apache/beam/blob/7b3d4251d244c10545fb37f1d93ebcad84a98681/sdks/python/apache_beam/runners/worker/data_plane.py#L182)
 starting from when a stream is constructed, there is no guarantee that one 
stream's callback is called before the other.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #12895: [BEAM-9372][BEAM-9980] Switches Flink VR suite to Py36 and makes the version configurable.

2020-09-21 Thread GitBox


tvalentyn commented on a change in pull request #12895:
URL: https://github.com/apache/beam/pull/12895#discussion_r492360722



##
File path: sdks/python/test-suites/gradle.properties
##
@@ -27,3 +27,6 @@ dataflow_chicago_taxi_example_task_py_versions=3.7
 
 # direct test-suites
 direct_mongodbio_it_task_py_versions=3.5
+
+# portable test-suites
+portable_flink_validates_runner_py_versions=3.6

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on a change in pull request #12895: [BEAM-9372][BEAM-9980] Switches Flink VR suite to Py36 and makes the version configurable.

2020-09-21 Thread GitBox


ibzib commented on a change in pull request #12895:
URL: https://github.com/apache/beam/pull/12895#discussion_r492356821



##
File path: sdks/python/test-suites/gradle.properties
##
@@ -27,3 +27,6 @@ dataflow_chicago_taxi_example_task_py_versions=3.7
 
 # direct test-suites
 direct_mongodbio_it_task_py_versions=3.5
+
+# portable test-suites
+portable_flink_validates_runner_py_versions=3.6

Review comment:
   Can we add all supported Python versions?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12782: Overriding Dataflow Native BQSource.

2020-09-21 Thread GitBox


pabloem commented on pull request #12782:
URL: https://github.com/apache/beam/pull/12782#issuecomment-696354774







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik merged pull request #12367: [BEAM-10564] Support more Avro field name formats when mapping to Jav…

2020-09-21 Thread GitBox


lukecwik merged pull request #12367:
URL: https://github.com/apache/beam/pull/12367


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik merged pull request #12794: [BEAM-10865] Support for Kafka deserialization API with headers (since Kafka API 2.1.0)

2020-09-21 Thread GitBox


lukecwik merged pull request #12794:
URL: https://github.com/apache/beam/pull/12794


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12891: [BEAM-7372][BEAM-9372] Removes Python 2 and Python 3.5 Postcommit jobs.

2020-09-21 Thread GitBox


tvalentyn commented on pull request #12891:
URL: https://github.com/apache/beam/pull/12891#issuecomment-696284493







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] trucleduc commented on pull request #12721: [BEAM-10871] Add deidentify for FhirIO connector

2020-09-21 Thread GitBox


trucleduc commented on pull request #12721:
URL: https://github.com/apache/beam/pull/12721#issuecomment-696408469







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12830: [BEAM-10716] TestPubsub/TestPubsubSignal clean up subscriptions

2020-09-21 Thread GitBox


TheNeuralBit commented on pull request #12830:
URL: https://github.com/apache/beam/pull/12830#issuecomment-696369596







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12902: [BEAM-10871] Fix FhirLROIT tests

2020-09-21 Thread GitBox


TheNeuralBit commented on pull request #12902:
URL: https://github.com/apache/beam/pull/12902#issuecomment-696415795







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on a change in pull request #12727: [BEAM-10844] Add experiment option prebuild_sdk_container to prebuild python sdk container with dependencies.

2020-09-21 Thread GitBox


aaltay commented on a change in pull request #12727:
URL: https://github.com/apache/beam/pull/12727#discussion_r492448750



##
File path: sdks/python/setup.py
##
@@ -215,6 +215,8 @@ def get_version():
 'google-cloud-language>=1.3.0,<2',
 'google-cloud-videointelligence>=1.8.0,<2',
 'google-cloud-vision>=0.38.0,<2',
+# GCP packages required by fast container startup functionality.

Review comment:
   nit: "required by prebuild sdk container functionality". That is what 
you are calling it in options.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #12852: [BEAM-10890] Log error counts to debug BigQuery streaming insert requ…

2020-09-21 Thread GitBox


ihji commented on pull request #12852:
URL: https://github.com/apache/beam/pull/12852#issuecomment-696327213


   @ajamato Friendly ping  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #12896: Update indexing skips for pandas 1.x

2020-09-21 Thread GitBox


TheNeuralBit commented on pull request #12896:
URL: https://github.com/apache/beam/pull/12896#issuecomment-696399245







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem merged pull request #12871: Fix broken link for ParDo docs

2020-09-21 Thread GitBox


pabloem merged pull request #12871:
URL: https://github.com/apache/beam/pull/12871


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj merged pull request #12880: [BEAM-10933] Adjust GBK type before creating the pipeline proto

2020-09-21 Thread GitBox


chamikaramj merged pull request #12880:
URL: https://github.com/apache/beam/pull/12880


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chadrik commented on pull request #12885: [BEAM-7746] Add type checking to runners.pipeline_context

2020-09-21 Thread GitBox


chadrik commented on pull request #12885:
URL: https://github.com/apache/beam/pull/12885#issuecomment-695814514


   R: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] caller9 closed pull request #12692: [BEAM-10819] Upgrade Gradle to 6.1.1

2020-09-21 Thread GitBox


caller9 closed pull request #12692:
URL: https://github.com/apache/beam/pull/12692


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] commented on pull request #12762: Ensuring that BigQuery jobs are tagged with the Dataflow step that launches them

2020-09-21 Thread GitBox


codecov[bot] commented on pull request #12762:
URL: https://github.com/apache/beam/pull/12762#issuecomment-696459985


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12762?src=pr=h1) Report
   > Merging 
[#12762](https://codecov.io/gh/apache/beam/pull/12762?src=pr=desc) into 
[master](https://codecov.io/gh/apache/beam/commit/958e445ae49da6cf5b67f769e520d90fd8aed60d?el=desc)
 will **increase** coverage by `41.69%`.
   > The diff coverage is `44.18%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12762/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12762?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ##   master   #12762   +/-   ##
   ===
   + Coverage   40.30%   81.99%   +41.69% 
   ===
 Files 451  459+8 
 Lines   5316854387 +1219 
   ===
   + Hits2142944597+23168 
   + Misses  31739 9790-21949 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12762?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/io/azure/blobstorageio.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vYXp1cmUvYmxvYnN0b3JhZ2Vpby5weQ==)
 | `26.95% <26.95%> (ø)` | |
   | 
[sdks/python/apache\_beam/io/filesystems.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZmlsZXN5c3RlbXMucHk=)
 | `87.50% <50.00%> (+29.74%)` | :arrow_up: |
   | 
[sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5)
 | `79.78% <56.52%> (+53.01%)` | :arrow_up: |
   | 
[...thon/apache\_beam/io/azure/blobstoragefilesystem.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vYXp1cmUvYmxvYnN0b3JhZ2VmaWxlc3lzdGVtLnB5)
 | `77.31% <77.31%> (ø)` | |
   | 
[sdks/python/apache\_beam/io/azure/\_\_init\_\_.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vYXp1cmUvX19pbml0X18ucHk=)
 | `100.00% <100.00%> (ø)` | |
   | 
[...s/python/apache\_beam/io/gcp/bigquery\_file\_loads.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2ZpbGVfbG9hZHMucHk=)
 | `89.97% <100.00%> (+66.61%)` | :arrow_up: |
   | 
[.../python/apache\_beam/io/gcp/bigquery\_io\_metadata.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5X2lvX21ldGFkYXRhLnB5)
 | `90.62% <100.00%> (+47.14%)` | :arrow_up: |
   | 
[setup.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2V0dXAucHk=)
 | `0.00% <0.00%> (ø)` | |
   | 
[sdks/python/apache\_beam/portability/python\_urns.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvcHl0aG9uX3VybnMucHk=)
 | `100.00% <0.00%> (ø)` | |
   | 
[...apache\_beam/portability/api/beam\_runner\_api\_pb2.py](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fcnVubmVyX2FwaV9wYjIucHk=)
 | `100.00% <0.00%> (ø)` | |
   | ... and [290 
more](https://codecov.io/gh/apache/beam/pull/12762/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12762?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12762?src=pr=footer). Last 
update 
[bae1e7b...a5d8020](https://codecov.io/gh/apache/beam/pull/12762?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] commented on pull request #12896: Update indexing skips for pandas 1.x

2020-09-21 Thread GitBox


codecov[bot] commented on pull request #12896:
URL: https://github.com/apache/beam/pull/12896#issuecomment-696398122


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12896?src=pr=h1) Report
   > Merging 
[#12896](https://codecov.io/gh/apache/beam/pull/12896?src=pr=desc) into 
[master](https://codecov.io/gh/apache/beam/commit/067cba8229694e7fb9693313f51ca686746b620a?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12896/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12896?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master   #12896  +/-   ##
   ==
   - Coverage   82.34%   82.33%   -0.02% 
   ==
 Files 452  452  
 Lines   5401654016  
   ==
   - Hits4448144473   -8 
   - Misses   9535 9543   +8 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12896?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/12896/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5)
 | `90.90% <0.00%> (-0.76%)` | :arrow_down: |
   | 
[...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/12896/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==)
 | `88.80% <0.00%> (-0.54%)` | :arrow_down: |
   | 
[sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/12896/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=)
 | `88.75% <0.00%> (-0.45%)` | :arrow_down: |
   | 
[...eam/runners/portability/fn\_api\_runner/fn\_runner.py](https://codecov.io/gh/apache/beam/pull/12896/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2ZuX3J1bm5lci5weQ==)
 | `89.53% <0.00%> (-0.21%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12896?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12896?src=pr=footer). Last 
update 
[3df7495...aaecf18](https://codecov.io/gh/apache/beam/pull/12896?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib merged pull request #12897: Clean up CHANGES.md in preparation for 2.25.0 release.

2020-09-21 Thread GitBox


ibzib merged pull request #12897:
URL: https://github.com/apache/beam/pull/12897


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #12850: [BEAM-10481] Ensure registration of the accumulator occurs.

2020-09-21 Thread GitBox


ibzib commented on pull request #12850:
URL: https://github.com/apache/beam/pull/12850#issuecomment-696373462


   R: @aromanenko-dev 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #12896: Update indexing skips for pandas 1.x

2020-09-21 Thread GitBox


tvalentyn commented on pull request #12896:
URL: https://github.com/apache/beam/pull/12896#issuecomment-696457581


   > That's unexpected.. I wonder if it was disabled accidentally (perhaps 
because of Python version removals?)
   
   Yes, sorry about that - I was testing my changes and ran a seed job on  
#12898 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on a change in pull request #12830: [BEAM-10716] TestPubsub/TestPubsubSignal clean up subscriptions

2020-09-21 Thread GitBox


amaliujia commented on a change in pull request #12830:
URL: https://github.com/apache/beam/pull/12830#discussion_r492300048



##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java
##
@@ -239,6 +245,12 @@ public void waitForSuccess(Duration duration) throws 
IOException {
 
 String result = pollForResultForDuration(resultSubscriptionPath, duration);
 
+try {
+  pubsub.deleteSubscription(resultSubscriptionPath);
+} catch (IOException e) {
+  LOG.warn(String.format("Leaked PubSub subscription '%s'", 
resultSubscriptionPath));

Review comment:
   nit: make error will make it easier to be discovered.

##
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsubSignal.java
##
@@ -239,6 +245,12 @@ public void waitForSuccess(Duration duration) throws 
IOException {
 
 String result = pollForResultForDuration(resultSubscriptionPath, duration);
 
+try {
+  pubsub.deleteSubscription(resultSubscriptionPath);
+} catch (IOException e) {
+  LOG.warn(String.format("Leaked PubSub subscription '%s'", 
resultSubscriptionPath));

Review comment:
   nit: Will LOG.error make it easier to be discovered?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12806: [BEAM-10869] Make WriteToPubsub output serialized PubsubMessage proto bytes when using runner v2

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12806:
URL: https://github.com/apache/beam/pull/12806#issuecomment-692876818


   # [Codecov](https://codecov.io/gh/apache/beam/pull/12806?src=pr=h1) Report
   > Merging 
[#12806](https://codecov.io/gh/apache/beam/pull/12806?src=pr=desc) into 
[master](https://codecov.io/gh/apache/beam/commit/2bb60c323095340240ec4982c1e9dabc397107e5?el=desc)
 will **increase** coverage by `41.86%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/beam/pull/12806/graphs/tree.svg?width=650=150=pr=qcbbAh8Fj1)](https://codecov.io/gh/apache/beam/pull/12806?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ##   master   #12806   +/-   ##
   ===
   + Coverage   40.43%   82.30%   +41.86% 
   ===
 Files 449  451+2 
 Lines   5353053855  +325 
   ===
   + Hits2164344323+22680 
   + Misses  31887 9532-22355 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/beam/pull/12806?src=pr=tree) | Coverage 
Δ | |
   |---|---|---|
   | 
[sdks/python/apache\_beam/io/gcp/pubsub.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL3B1YnN1Yi5weQ==)
 | `92.22% <100.00%> (+50.21%)` | :arrow_up: |
   | 
[sdks/python/apache\_beam/io/snowflake.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vc25vd2ZsYWtlLnB5)
 | `64.15% <0.00%> (ø)` | |
   | 
[...apache\_beam/portability/api/beam\_runner\_api\_pb2.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fcnVubmVyX2FwaV9wYjIucHk=)
 | `100.00% <0.00%> (ø)` | |
   | 
[...he\_beam/testing/benchmarks/nexmark/nexmark\_util.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya191dGlsLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...e\_beam/portability/api/beam\_runner\_api\_pb2\_urns.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcG9ydGFiaWxpdHkvYXBpL2JlYW1fcnVubmVyX2FwaV9wYjJfdXJucy5weQ==)
 | `100.00% <0.00%> (ø)` | |
   | 
[...eam/testing/benchmarks/nexmark/nexmark\_launcher.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19sYXVuY2hlci5weQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ache\_beam/runners/interactive/pipeline\_analyzer.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9waXBlbGluZV9hbmFseXplci5weQ==)
 | | |
   | 
[sdks/python/apache\_beam/utils/histogram.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaGlzdG9ncmFtLnB5)
 | `94.28% <0.00%> (ø)` | |
   | 
[...beam/testing/benchmarks/nexmark/queries/query10.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvcXVlcmllcy9xdWVyeTEwLnB5)
 | `0.00% <0.00%> (ø)` | |
   | 
[...he\_beam/testing/benchmarks/nexmark/nexmark\_perf.py](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy9iZW5jaG1hcmtzL25leG1hcmsvbmV4bWFya19wZXJmLnB5)
 | `0.00% <0.00%> (ø)` | |
   | ... and [276 
more](https://codecov.io/gh/apache/beam/pull/12806/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/beam/pull/12806?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/beam/pull/12806?src=pr=footer). Last 
update 
[b6d0abb...b13bb7f](https://codecov.io/gh/apache/beam/pull/12806?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rworley-monster commented on pull request #12367: [BEAM-10564] Support more Avro field name formats when mapping to Jav…

2020-09-21 Thread GitBox


rworley-monster commented on pull request #12367:
URL: https://github.com/apache/beam/pull/12367#issuecomment-695968906


   Thank you for the approval.  Is there anything else that I can do to help 
get this merged or is it just a matter of waiting for additional review and 
approval?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] codecov[bot] edited a comment on pull request #12762: Ensuring that BigQuery jobs are tagged with the Dataflow step that launches them

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12762:
URL: https://github.com/apache/beam/pull/12762#issuecomment-696459985







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chadrik commented on a change in pull request #12881: [BEAM-7746] Get mypy passing on runners.worker

2020-09-21 Thread GitBox


chadrik commented on a change in pull request #12881:
URL: https://github.com/apache/beam/pull/12881#discussion_r491722757



##
File path: sdks/python/apache_beam/runners/worker/statecache.py
##
@@ -44,15 +48,17 @@ class Metrics(object):
   PREFIX = "beam:metric:statecache:"
 
   def __init__(self):
+# type: () -> None
 self._context = threading.local()
 
   def initialize(self):
+# type: () -> None
+
 """Needs to be called once per thread to initialize the local metrics 
cache.
 """
 if hasattr(self._context, 'metrics'):
   return  # Already initialized
-self._context.metrics = collections.defaultdict(
-int)  # type: DefaultDict[Hashable, int]

Review comment:
   Removed this annotation because it's invalid:   you can't add 
annotations to an attribute of another class (non-self attribute).  I've 
noticed there's definitely a need for a typed version of `threading.local`.
   

##
File path: sdks/python/apache_beam/runners/worker/log_handler.py
##
@@ -156,9 +174,11 @@ def _write_log_entries(self):
 done = True
 log_entries.pop()
   if log_entries:
-yield beam_fn_api_pb2.LogEntry.List(log_entries=log_entries)
+yield beam_fn_api_pb2.LogEntry.List(
+log_entries=cast(List[beam_fn_api_pb2.LogEntry], log_entries))

Review comment:
   we have to cast here because `log_entries` started out life as a 
`List[Union[..., Sentinel]]`, and just above this line we checked for the 
sentinel and popped it out, but mypy can't track that. 

##
File path: sdks/python/apache_beam/runners/worker/log_handler.py
##
@@ -156,9 +174,11 @@ def _write_log_entries(self):
 done = True
 log_entries.pop()
   if log_entries:
-yield beam_fn_api_pb2.LogEntry.List(log_entries=log_entries)
+yield beam_fn_api_pb2.LogEntry.List(
+log_entries=cast(List[beam_fn_api_pb2.LogEntry], log_entries))

Review comment:
   we have to cast here because `log_entries` was initialized as a 
`List[Union[..., Sentinel]]`, and just above this line we checked for the 
sentinel and popped it out, but mypy can't track that. 

##
File path: sdks/python/apache_beam/runners/worker/data_plane.py
##
@@ -303,9 +334,10 @@ def inverse(self):
 
   def input_elements(self,
   instruction_id,  # type: str
-  unused_expected_inputs=None,   # type: Collection[str]
+  unused_expected_inputs,   # type: Any

Review comment:
   It's more accurate to say `Any` here, since the method doesn't care.  
The argument can't be `Optional` or the method it would be incompatible with 
its super type. 
   
   I checked all uses of `input_elements` and I did not see any case where it 
was called with only one arg.  
   

##
File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
##
@@ -1619,7 +1633,7 @@ def _create_pardo_operation(
   consumers,
   output_tags)
   if pardo_proto and pardo_proto.restriction_coder_id:
-result.input_info = (
+result.input_info = operations.OpInputInfo(

Review comment:
   Making this a NamedTuple added a bit of sanity, and should be completely 
safe.

##
File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
##
@@ -174,11 +191,14 @@ def __init__(self,
 self._state_handler_factory = GrpcStateHandlerFactory(
 self._state_cache, credentials)
 self._profiler_factory = profiler_factory
-self._fns = KeyedDefaultDict(
-lambda id: self._control_stub.GetProcessBundleDescriptor(
-beam_fn_api_pb2.GetProcessBundleDescriptorRequest(
-process_bundle_descriptor_id=id))
-)  # type: Mapping[str, beam_fn_api_pb2.ProcessBundleDescriptor]
+
+def default_factory(id):
+  # type: (str) -> beam_fn_api_pb2.ProcessBundleDescriptor
+  return self._control_stub.GetProcessBundleDescriptor(
+  beam_fn_api_pb2.GetProcessBundleDescriptorRequest(
+  process_bundle_descriptor_id=id))
+
+self._fns = KeyedDefaultDict(default_factory)

Review comment:
   changing this to a function lets us add annotations which silences a 
mypy error.  mypy can now detect the key/value types of `KeyedDefaultDict`

##
File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
##
@@ -464,16 +496,17 @@ class SdkWorker(object):
 
   def __init__(self,
bundle_processor_cache,  # type: BundleProcessorCache
-   state_cache_metrics_fn=list,
+   state_cache_metrics_fn=list,  # type: Callable[[], 
Iterable[metrics_pb2.MonitoringInfo]]
profiler_factory=None,  # type: Optional[Callable[..., Profile]]
-   log_lull_timeout_ns=None,
+   log_lull_timeout_ns=None,  # type: Optional[int]
   ):
+# type: (...) -> None
 self.bundle_processor_cache = bundle_processor_cache
 self.state_cache_metrics_fn = 

[GitHub] [beam] robertwb commented on a change in pull request #12889: Dataframe wordcount example.

2020-09-21 Thread GitBox


robertwb commented on a change in pull request #12889:
URL: https://github.com/apache/beam/pull/12889#discussion_r492284647



##
File path: sdks/python/apache_beam/examples/wordcount_dataframe.py
##
@@ -0,0 +1,73 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A word-counting workflow using dataframes."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+
+import apache_beam as beam
+from apache_beam.dataframe import io
+from apache_beam.dataframe.convert import to_dataframe
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+
+
+def run(argv=None):
+  """Main entry point; defines and runs the wordcount pipeline."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+  '--input',
+  dest='input',
+  default='gs://dataflow-samples/shakespeare/kinglear.txt',
+  help='Input file to process.')
+  parser.add_argument(
+  '--output',
+  dest='output',
+  required=True,
+  help='Output file to write results to.')
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  # Import this here to avoid pickling the main session.
+  import re
+
+  # The pipeline will be run on exiting the with block.
+  with beam.Pipeline(options=PipelineOptions(pipeline_args)) as p:
+
+# Read the text file[pattern] into a PCollection.
+lines = p | 'Read' >> ReadFromText(known_args.input)
+
+words = (
+lines
+| 'Split' >> beam.FlatMap(
+lambda line: re.findall(r'[\w]+', line)).with_output_types(str)
+# Map to Row objects to generate a schema suitable for conversion to a 
dataframe.
+| 'ToRows' >> beam.Map(lambda word: beam.Row(word=word)))
+
+df = to_dataframe(words)
+df['count'] = 1
+counted = df.groupby('word').sum()
+counted.to_csv(known_args.output)

Review comment:
   Makes sense, let's do that.

##
File path: sdks/python/apache_beam/examples/wordcount_dataframe.py
##
@@ -0,0 +1,73 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A word-counting workflow using dataframes."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import argparse
+import logging
+
+import apache_beam as beam
+from apache_beam.dataframe import io
+from apache_beam.dataframe.convert import to_dataframe
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+
+
+def run(argv=None):
+  """Main entry point; defines and runs the wordcount pipeline."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+  '--input',
+  dest='input',
+  default='gs://dataflow-samples/shakespeare/kinglear.txt',
+  help='Input file to process.')
+  parser.add_argument(
+  '--output',
+  dest='output',
+  required=True,
+  help='Output file to write results to.')
+  known_args, pipeline_args = parser.parse_known_args(argv)
+
+  # Import this here to avoid pickling the main session.
+  import re
+
+  # The pipeline will be run on exiting the with block.
+  with beam.Pipeline(options=PipelineOptions(pipeline_args)) as p:
+
+# Read the text file[pattern] into a PCollection.
+lines = p | 'Read' >> ReadFromText(known_args.input)
+
+words = (
+lines
+| 'Split' >> beam.FlatMap(
+lambda line: re.findall(r'[\w]+', line)).with_output_types(str)
+# Map to Row objects to generate 

[GitHub] [beam] codecov[bot] edited a comment on pull request #12727: [BEAM-10844] Add experiment option prebuild_sdk_container to prebuild python sdk container with dependencies.

2020-09-21 Thread GitBox


codecov[bot] edited a comment on pull request #12727:
URL: https://github.com/apache/beam/pull/12727#issuecomment-683229966







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on a change in pull request #12880: [BEAM-10933] Adjust GBK and Flatten types before creating the pipeline proto

2020-09-21 Thread GitBox


chamikaramj commented on a change in pull request #12880:
URL: https://github.com/apache/beam/pull/12880#discussion_r492380763



##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
##
@@ -488,12 +497,15 @@ def run_pipeline(self, pipeline, options):
   # Cross language transform require using a pipeline object constructed
   # from the full pipeline proto to make sure that expanded version of
   # external transforms are reflected in the Pipeline job graph.
+  # TODO(chamikara): remove following pipeline and pipeline proto 
recreation
+  # after portable job submission path is fully in place.
   from apache_beam import Pipeline
   pipeline = Pipeline.from_runner_api(
   self.proto_pipeline,
   pipeline.runner,
   options,
   allow_proto_holders=True)
+  self._adjust_types_for_dataflow(pipeline)

Review comment:
   Verified that types are preserved in the pipeline->proto->pipeline 
roundtrip and removed this.

##
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
##
@@ -430,6 +430,11 @@ def _check_for_unsupported_fnapi_features(self, 
pipeline_proto):
 components.coders[windowing_strategy.window_coder_id].spec.urn,
 windowing_strategy.window_fn.urn))
 
+  def _adjust_types_for_dataflow(self, pipeline):

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #12762: Ensuring that BigQuery jobs are tagged with the Dataflow step that launches them

2020-09-21 Thread GitBox


pabloem commented on pull request #12762:
URL: https://github.com/apache/beam/pull/12762#issuecomment-696374290







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] methodmissing commented on pull request #12794: [BEAM-10865] Support for Kafka deserialization API with headers (since Kafka API 2.1.0)

2020-09-21 Thread GitBox


methodmissing commented on pull request #12794:
URL: https://github.com/apache/beam/pull/12794#issuecomment-696391542


   Thanks for the feedback and the merge Luke. I'll address the updates in a PR.
   
   I agree all tests makes more sense as one can never have enough coverage. 
That said, I'm been thinking about the `ConsumerSpEL` and the fact the in the 
test suite it until now only executed against Kafka clients API 1.0.0 
(`library.java.kafka_clients`) for each run. It's possible to also have the 
tasks and configurations dynamically created for a range of Kafka API versions 
too now that a pattern emerged, but that also would add quite a lot of time to 
test runs.
   
   Thoughts?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #12874: [BEAM-10930] Use dense JSON responses for BigQueryIO interactions

2020-09-21 Thread GitBox


lukecwik commented on pull request #12874:
URL: https://github.com/apache/beam/pull/12874#issuecomment-696355924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >