[GitHub] [beam] robinyqiu commented on pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType

2020-05-05 Thread GitBox


robinyqiu commented on pull request #11272:
URL: https://github.com/apache/beam/pull/11272#issuecomment-624452448


   Could you help trigger the tests again?
   
   For the comment on range: Thanks for pointing it out. I overlooked this 
problem. I would like to create a separate PR to address it, along with range 
testing for other types as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robinyqiu commented on a change in pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType

2020-05-05 Thread GitBox


robinyqiu commented on a change in pull request #11272:
URL: https://github.com/apache/beam/pull/11272#discussion_r418871819



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java
##
@@ -427,17 +430,12 @@ private static Expression value(
 
 private static Expression value(Expression value, Schema.FieldType type) {
   if (type.getTypeName().isLogicalType()) {
-Expression millisField = Expressions.call(value, "getMillis");
 String logicalId = type.getLogicalType().getIdentifier();
 if (logicalId.equals(TimeType.IDENTIFIER)) {
-  return nullOr(value, Expressions.convert_(millisField, int.class));
-} else if (logicalId.equals(DateType.IDENTIFIER)) {
-  value =
-  nullOr(
-  value,
-  Expressions.convert_(
-  Expressions.divide(millisField, 
Expressions.constant(MILLIS_PER_DAY)),
-  int.class));
+  return nullOr(
+  value, Expressions.convert_(Expressions.call(value, 
"getMillis"), int.class));
+} else if (logicalId.equals(SqlTypes.DATE.getIdentifier())) {

Review comment:
   Done. I hope I could use a switch statement here, but unfortunately 
there is no constant `IDENTIFIER` defined  in the `LogicalType` class. (I could 
add it to each concrete SQL logical type I create, but I don't think that is a 
good style.)

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Date.java
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.logicaltypes;
+
+import java.time.LocalDate;
+import org.apache.beam.sdk.schemas.Schema;
+
+/**
+ * A date without a time-zone.
+ *
+ * It cannot represent an instant on the time-line without additional 
information such as an

Review comment:
   Done.

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamEnumerableConverter.java
##
@@ -303,8 +303,8 @@ private static Object fieldToAvatica(Schema.FieldType type, 
Object beamValue) {
 String logicalId = type.getLogicalType().getIdentifier();
 if (logicalId.equals(TimeType.IDENTIFIER)) {
   return (int) ((ReadableInstant) beamValue).getMillis();
-} else if (logicalId.equals(DateType.IDENTIFIER)) {
-  return (int) (((ReadableInstant) beamValue).getMillis() / 
MILLIS_PER_DAY);
+} else if (logicalId.equals(SqlTypes.DATE.getIdentifier())) {

Review comment:
   Done.

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Date.java
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.logicaltypes;
+
+import java.time.LocalDate;
+import org.apache.beam.sdk.schemas.Schema;
+
+/**
+ * A date without a time-zone.
+ *
+ * It cannot represent an instant on the time-line without additional 
information such as an
+ * offset or time-zone.
+ */
+public class Date implements Schema.LogicalType {

Review comment:
   I think Andrew is basically suggesting using a 
`PassThroughLogicalType` as a logical type for `DATE`. I think we could 
definitely consider this if performance becomes a problem in the future. (It's 
not easy to change the in-memory type for `Date` after it 

[GitHub] [beam] robinyqiu commented on pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType

2020-05-05 Thread GitBox


robinyqiu commented on pull request #11272:
URL: https://github.com/apache/beam/pull/11272#issuecomment-624451644


   Ah, just realized that the previous comments were not sent out.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624431077


   Run PythonDocker PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624431173


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624430895


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624431013


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] stale[bot] commented on pull request #10791: [BEAM-9250] Update release guide with more instructions.

2020-05-05 Thread GitBox


stale[bot] commented on pull request #10791:
URL: https://github.com/apache/beam/pull/10791#issuecomment-624403721


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] stale[bot] commented on pull request #11034: [BEAM-9424] Allow grouping by LogicalType

2020-05-05 Thread GitBox


stale[bot] commented on pull request #11034:
URL: https://github.com/apache/beam/pull/11034#issuecomment-624403737


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 commented on a change in pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


rahul8383 commented on a change in pull request #11609:
URL: https://github.com/apache/beam/pull/11609#discussion_r420487951



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java
##
@@ -97,4 +99,19 @@ public void testNanosDuration() {
 assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class));
 assertEquals(durationAsRow, row.getBaseValue(0, Row.class));
   }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void testFixedBytesIllegalArgument() {
+Schema schema = Schema.builder().addLogicalTypeField("char", 
FixedBytes.of(10)).build();
+byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5};
+Row row = Row.withSchema(schema).withFieldValue("char", 
byteArrayWithLengthFive).build();
+  }
+
+  @Test
+  public void testFixedBytes() {
+Schema schema = Schema.builder().addLogicalTypeField("char", 
FixedBytes.of(10)).build();
+byte[] byteArray = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
+Row row = Row.withSchema(schema).withFieldValue("char", byteArray).build();
+assertTrue(Arrays.equals(byteArray, row.getLogicalTypeValue("char", 
byte[].class)));
+  }

Review comment:
   Moved the tests to RowTest.java
   
   case in point! 
   How can I write `FixedBytes` test which tests the behaviour of appending 
zeros? To test this behaviour, the input value should have length < 
expectedLength. But, if the input value's length is less than expected length, 
an `IllegalArgumentException` is thrown while building the Row.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 commented on a change in pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


rahul8383 commented on a change in pull request #11609:
URL: https://github.com/apache/beam/pull/11609#discussion_r420486636



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java
##
@@ -97,4 +99,19 @@ public void testNanosDuration() {
 assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class));
 assertEquals(durationAsRow, row.getBaseValue(0, Row.class));
   }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void testFixedBytesIllegalArgument() {
+Schema schema = Schema.builder().addLogicalTypeField("char", 
FixedBytes.of(10)).build();
+byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5};
+Row row = Row.withSchema(schema).withFieldValue("char", 
byteArrayWithLengthFive).build();
+  }

Review comment:
   Added `RowTest.testLogicalTypeWithInvalidInputValueByFieldIndex`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624370689


   Run Python 3.7 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11581: [BEAM-8307] NPE in Calcite dialect when input PCollection has logical…

2020-05-05 Thread GitBox


amaliujia commented on pull request #11581:
URL: https://github.com/apache/beam/pull/11581#issuecomment-624366230


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy

2020-05-05 Thread GitBox


iemejia commented on pull request #11613:
URL: https://github.com/apache/beam/pull/11613#issuecomment-624365589


   I am ok with squashing if it creates extra commits, but it does not seem to 
be the case or does it create the additional extra merge commit? I just want to 
ensure we follow the rules.
   
   Now if the goal is to change the rules maybe we should move the discussion 
to the ML.
   Pinging @kennknowles who created our default policy (extra Merge commit) to 
see what he thinks.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy

2020-05-05 Thread GitBox


pabloem commented on pull request #11613:
URL: https://github.com/apache/beam/pull/11613#issuecomment-624363164


   I think squash is used by many, and it facilitates receiving contributions 
without the extra round trip to the contributors. I think we need to discuss 
more before removing it. Wdyt @iemejia?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11590: [BEAM-8944] Improve UnboundedThreadPoolExecutor performance

2020-05-05 Thread GitBox


pabloem commented on a change in pull request #11590:
URL: https://github.com/apache/beam/pull/11590#discussion_r420468171



##
File path: sdks/python/apache_beam/utils/thread_pool_executor.py
##
@@ -137,35 +101,33 @@ def submit(self, fn, *args, **kwargs):
 """
 future = _base.Future()
 work_item = _WorkItem(future, fn, args, kwargs)
-try:
-  # Keep trying to get an idle worker from the queue until we find one
-  # that accepts the work.
-  while not self._idle_worker_queue.get(
-  block=False).accepted_work(work_item):
-pass
-  return future
-except queue.Empty:
-  with self._lock:
-if self._shutdown:
-  raise RuntimeError(
-  'Cannot schedule new tasks after thread pool '
-  'has been shutdown.')
-
-worker = _Worker(
-self._idle_worker_queue,
-self._permitted_thread_age_in_seconds,
-work_item)
+with self._lock:
+  if self._shutdown:
+raise RuntimeError(
+'Cannot schedule new tasks after thread pool has been shutdown.')
+  try:
+self._idle_worker_queue.get(block=False).assign_work(work_item)
+
+# If we have more idle threads then the max allowed, shutdown a thread.
+if self._idle_worker_queue.qsize() > self._max_idle_threads:
+  try:
+self._idle_worker_queue.get(block=False).shutdown()

Review comment:
   should we remove a total of `self._max_idle_threads - 
self._idle_worker_queue.qsize()` workers rather than just one?
   IIUC, this is the only point (besides shutdown) where workers are removed, 
so maybe yes?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


rahul8383 commented on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624360661


   > We always convert logical types to their base type when serializing with 
SchemaCoder, and convert back to the input type when deserializing. Other than 
that I think the only time it should get called is when constructing a Row 
instance (unless you use attachValues).
   
   In that case, there is no need to handle this `else` case right? as we are 
making sure that the input has expected length while building the Row.
   
https://github.com/apache/beam/blob/5e1571760b61b8ce247d5375b71c8df4d69d6409/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/FixedBytes.java#L77
   Even if `attachValues` is used while building the Row and the provided input 
value is invalid(invalid length), during serialization in `SchemaCoder`, the 
input value cannot be converted to base type as it doesn't have expected length 
and an `IllegalArgumentException` will be thrown.
   
   > Would this just be so that we're guaranteed to call `toInputType` whenever 
setting a value on Row? This PR accomplishes the same thing right?
   
   Can we support this feature: depending on the type of the input value 
provided while building the Row, we can call 
`toInputType(toBaseType(inputValue))` or `toInputType(inputValue)` i.e. support 
for providing base value while building the Row. If both the InputType and 
BaseType are one and the same, we can directly call `toInputType(inputValue)`. 
I am thinking that this might be helpful for logical types like `FixedBytes` or 
`FixedLengthString`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11616: Use csv reader instead of split to read csv data.

2020-05-05 Thread GitBox


aaltay commented on pull request #11616:
URL: https://github.com/apache/beam/pull/11616#issuecomment-624360261


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG commented on pull request #11616: Use csv reader instead of split to read csv data.

2020-05-05 Thread GitBox


KevinGG commented on pull request #11616:
URL: https://github.com/apache/beam/pull/11616#issuecomment-624359913


   R: @aaltay 
   
   PTAL, thx!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


TheNeuralBit commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624359510


   Ok I removed `beam_PreCommit_Website_Commit/src/website/www` and 
`beam_PreCommit_Website_Stage_GCS_Commit/src/website/www` on 
`apache-beam-jenkins-{1..15} `. Hopefully that will unbreak those jobs. We need 
to figure out why this change is causing jenkins to create files owned by root 
though, any ideas?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624356609







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on a change in pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


iemejia commented on a change in pull request #11614:
URL: https://github.com/apache/beam/pull/11614#discussion_r420461169



##
File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java
##
@@ -242,6 +243,7 @@ public void close() {}
  * This test verifies that watermarks are correctly forwarded.
  */
 @Test(timeout = 30_000)
+@Ignore("https://issues.apache.org/jira/browse/BEAM-9164;)

Review comment:
   Oh my bad sorry.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy

2020-05-05 Thread GitBox


iemejia commented on pull request #11613:
URL: https://github.com/apache/beam/pull/11613#issuecomment-624353261


   @robertwb Agree, a consensus that nobody respects :). In the case of this PR 
I set into the merge approach because we have only [three 
options](https://help.github.com/en/github/administering-a-repository/about-merge-methods-on-github)
 and the other two (squash and rebase) do not create the additional merge 
commit required by the [Beam committer 
guide](https://beam.apache.org/contribute/committer-guide/#merging-it).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] KevinGG opened a new pull request #11616: Use csv reader instead of split to read csv data.

2020-05-05 Thread GitBox


KevinGG opened a new pull request #11616:
URL: https://github.com/apache/beam/pull/11616


   There might be comma in the csv formatted data itself. A naive split will 
generate malformed data and cause errors. Using `csv.reader` solves the issue.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[GitHub] [beam] pabloem commented on pull request #11296: [BEAM-9640] Sketching watermark tracking on FnApiRunner

2020-05-05 Thread GitBox


pabloem commented on pull request #11296:
URL: https://github.com/apache/beam/pull/11296#issuecomment-624351936


   > Sorry it took so long to get to this. Most of my questions are around 
watermark advancement.
   
   no worries. This is a critical component, and I have other work to do, so 
I'm glad to get a thoughtful review. I'll address your comments soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform

2020-05-05 Thread GitBox


pabloem commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r420456908



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1641,3 +1644,314 @@ def process(self, unused_element, signal):
 *self._args,
 **self._kwargs))
 | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location)))
+
+
+class _ExtractBQData(DoFn):
+  '''
+  PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into
+  a temporary storage and returns metadata for created files.
+  '''
+  def __init__(
+  self,
+  gcs_location_pattern=None,
+  project=None,
+  coder=None,
+  schema=None,
+  kms_key=None):
+
+self.gcs_location_pattern = gcs_location_pattern
+self.project = project
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+self.schema = schema
+self.target_schema = None
+
+  def process(self, element):
+'''
+:param element(ReadAllFromBigQueryRequest):
+:return:
+'''
+element.validate()
+if element.table is not None:
+  table_reference = bigquery_tools.parse_table_reference(element.table)
+  query = None
+  use_legacy_sql = True
+else:
+  query = element.query
+  use_legacy_sql = element.use_legacy_sql
+
+flatten_results = element.flatten_results
+
+bq = bigquery_tools.BigQueryWrapper()
+
+try:
+  if element.query is not None:
+self._setup_temporary_dataset(bq, query, use_legacy_sql)
+table_reference = self._execute_query(
+bq, query, use_legacy_sql, flatten_results)
+
+  gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex)
+
+  table_schema = bq.get_table(
+  table_reference.projectId,
+  table_reference.datasetId,
+  table_reference.tableId).schema
+
+  if self.target_schema is None:
+self.target_schema = bigquery_tools.parse_table_schema_from_json(
+json.dumps(self.schema))
+
+  if not self.target_schema == table_schema:
+raise ValueError((
+"Schema generated by reading from BQ doesn't match expected"
+"schema.\nExpected: {}\nActual: {}").format(
+self.target_schema, table_schema))
+
+  metadata_list = self._export_files(bq, table_reference, gcs_location)
+
+  yield pvalue.TaggedOutput('location_to_cleanup', gcs_location)
+  for metadata in metadata_list:
+yield metadata.path
+
+finally:
+  if query is not None:
+bq.clean_up_temporary_dataset(self.project)
+
+  def _setup_temporary_dataset(self, bq, query, use_legacy_sql):
+location = bq.get_query_location(self.project, query, use_legacy_sql)
+bq.create_temporary_dataset(self.project, location)
+
+  def _execute_query(self, bq, query, use_legacy_sql, flatten_results):
+job = bq._start_query_job(
+self.project,
+query,
+use_legacy_sql,
+flatten_results,
+job_id=uuid.uuid4().hex,
+kms_key=self.kms_key)
+job_ref = job.jobReference
+bq.wait_for_bq_job(job_ref)
+return bq._get_temp_table(self.project)
+
+  def _export_files(self, bq, table_reference, gcs_location):
+"""Runs a BigQuery export job.
+
+Returns:
+  a list of FileMetadata instances
+"""
+job_id = uuid.uuid4().hex
+job_ref = bq.perform_extract_job([gcs_location],
+ job_id,
+ table_reference,
+ bigquery_tools.FileFormat.JSON,
+ include_header=False)
+bq.wait_for_bq_job(job_ref)
+metadata_list = FileSystems.match([gcs_location])[0].metadata_list
+
+return metadata_list
+
+
+class _PassThroughThenCleanupWithSI(PTransform):
+  """A PTransform that invokes a DoFn after the input PCollection has been
+processed.
+
+DoFn should have arguments (element, side_input, cleanup_signal).
+
+Utilizes readiness of PCollection to trigger DoFn.
+  """
+  def __init__(self, cleanup_dofn, side_input):
+self.cleanup_dofn = cleanup_dofn
+self.side_input = side_input
+
+  def expand(self, input):
+class PassThrough(beam.DoFn):
+  def process(self, element):
+yield element
+
+main_output, cleanup_signal = input | beam.ParDo(
+  PassThrough()).with_outputs(
+  'cleanup_signal', main='main')
+
+_ = (
+input.pipeline
+| beam.Create([None])
+| beam.ParDo(
+self.cleanup_dofn,
+self.side_input,
+beam.pvalue.AsSingleton(cleanup_signal)))
+
+return main_output
+
+
+class ReadAllFromBigQueryRequest:

Review comment:
   I worry that this is a little clunky - but I appreciate that it provides 
validation, and even type checking if necessary. Perhaps give it a shorter name 
so it's 'easy' to create. 
   
   cc: @robertwb 

[GitHub] [beam] TheNeuralBit commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


TheNeuralBit commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624346035


   We need to stop re-testing this. Its creating a bunch of files owned by root 
on the workers, so then subsequent runs fail because they don't have 
permissions to clean it up:
   
   ```
   bhulette@apache-beam-jenkins-1:~$ ll 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Website_Stage_GCS_Commit/src/website/www/node_modules
 | head
   total 528
   drwxr-xr-x 122 rootroot 4096 Apr 28 20:02 ./
   drwxrwxr-x   4 jenkins jenkins  4096 Apr 29 09:51 ../
   drwxr-xr-x   2 rootroot 4096 Apr 28 20:02 ansi-regex/
   drwxr-xr-x   2 rootroot 4096 Apr 28 20:02 ansi-styles/
   drwxr-xr-x   2 rootroot 4096 Apr 28 20:02 anymatch/
   drwxr-xr-x   3 rootroot 4096 Apr 28 20:02 argparse/
   drwxr-xr-x   2 rootroot 4096 Apr 28 20:02 array-union/
   drwxr-xr-x   6 rootroot 4096 Apr 28 20:02 autoprefixer/
   drwxr-xr-x   2 rootroot 4096 Apr 28 20:02 balanced-match/
   ```
   
   It's causing the precommit to fail for other PRs too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on a change in pull request #11564: [Beam-9679] Add Core Transforms section / Map lesson to the Go SDK katas

2020-05-05 Thread GitBox


damondouglas commented on a change in pull request #11564:
URL: https://github.com/apache/beam/pull/11564#discussion_r420451192



##
File path: learning/katas/go/Core Transforms/Map/ParDo OneToMany/task.md
##
@@ -0,0 +1,32 @@
+
+
+# ParDo - One to Many
+
+In the previous kata we learned that ParDo maps a single element into another 
element.
+In this kata we will map a single element into many by splitting a sentence 
into words.
+
+**Kata:** Please write a ParDo that maps each input sentence into words 
tokenized by whitespace (" ").
+
+
+  Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo;>
+  ParDo

Review comment:
   @lostluck Do you mean this?  If so, I could check across the various 
tasks of the existing katas to make sure its consistent.
   ```
   
 Use https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#ParDo;>
 beam.ParDo
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


pabloem commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624343145


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform

2020-05-05 Thread GitBox


pabloem commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r420386613



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1641,3 +1644,314 @@ def process(self, unused_element, signal):
 *self._args,
 **self._kwargs))
 | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location)))
+
+
+class _ExtractBQData(DoFn):
+  '''
+  PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into
+  a temporary storage and returns metadata for created files.
+  '''
+  def __init__(
+  self,
+  gcs_location_pattern=None,
+  project=None,
+  coder=None,
+  schema=None,
+  kms_key=None):
+
+self.gcs_location_pattern = gcs_location_pattern
+self.project = project
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+self.schema = schema
+self.target_schema = None
+
+  def process(self, element):
+'''
+:param element(ReadAllFromBigQueryRequest):
+:return:
+'''
+element.validate()
+if element.table is not None:
+  table_reference = bigquery_tools.parse_table_reference(element.table)
+  query = None
+  use_legacy_sql = True
+else:
+  query = element.query
+  use_legacy_sql = element.use_legacy_sql
+
+flatten_results = element.flatten_results
+
+bq = bigquery_tools.BigQueryWrapper()
+
+try:
+  if element.query is not None:
+self._setup_temporary_dataset(bq, query, use_legacy_sql)
+table_reference = self._execute_query(
+bq, query, use_legacy_sql, flatten_results)
+
+  gcs_location = self.gcs_location_pattern.format(uuid.uuid4().hex)
+
+  table_schema = bq.get_table(
+  table_reference.projectId,
+  table_reference.datasetId,
+  table_reference.tableId).schema
+
+  if self.target_schema is None:
+self.target_schema = bigquery_tools.parse_table_schema_from_json(
+json.dumps(self.schema))
+
+  if not self.target_schema == table_schema:

Review comment:
   why do you need a target_schema?

##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1641,3 +1644,314 @@ def process(self, unused_element, signal):
 *self._args,
 **self._kwargs))
 | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location)))
+
+
+class _ExtractBQData(DoFn):
+  '''
+  PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into
+  a temporary storage and returns metadata for created files.
+  '''
+  def __init__(
+  self,
+  gcs_location_pattern=None,
+  project=None,
+  coder=None,
+  schema=None,
+  kms_key=None):
+
+self.gcs_location_pattern = gcs_location_pattern
+self.project = project
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+self.schema = schema
+self.target_schema = None
+
+  def process(self, element):
+'''
+:param element(ReadAllFromBigQueryRequest):
+:return:
+'''
+element.validate()
+if element.table is not None:
+  table_reference = bigquery_tools.parse_table_reference(element.table)
+  query = None
+  use_legacy_sql = True
+else:
+  query = element.query
+  use_legacy_sql = element.use_legacy_sql
+
+flatten_results = element.flatten_results
+
+bq = bigquery_tools.BigQueryWrapper()

Review comment:
   It would be great if the BQ wrapper could be passed a client as an 
argument, so that a mocked-out BQ client could be used.
   
   See 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L993
 and 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L1047-L1048

##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -1641,3 +1644,314 @@ def process(self, unused_element, signal):
 *self._args,
 **self._kwargs))
 | _PassThroughThenCleanup(RemoveJsonFiles(gcs_location)))
+
+
+class _ExtractBQData(DoFn):
+  '''
+  PTransform:ReadAllFromBigQueryRequest->FileMetadata that fetches BQ data into
+  a temporary storage and returns metadata for created files.
+  '''
+  def __init__(
+  self,
+  gcs_location_pattern=None,
+  project=None,
+  coder=None,
+  schema=None,
+  kms_key=None):
+
+self.gcs_location_pattern = gcs_location_pattern
+self.project = project
+self.coder = coder or _JsonToDictCoder
+self.kms_key = kms_key
+self.split_result = None
+self.schema = schema
+self.target_schema = None
+
+  def process(self, element):
+'''
+:param element(ReadAllFromBigQueryRequest):
+:return:
+'''
+element.validate()
+if element.table is not None:
+  table_reference = bigquery_tools.parse_table_reference(element.table)
+  query = None
+  

[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


pabloem commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624337831


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


pabloem commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624337083


   retest this please
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11590: [BEAM-8944] Improve UnboundedThreadPoolExecutor performance

2020-05-05 Thread GitBox


lukecwik commented on pull request #11590:
URL: https://github.com/apache/beam/pull/11590#issuecomment-624329537


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] reuvenlax commented on a change in pull request #11350: [BEAM-1589] Added @OnWindowExpiration annotation.

2020-05-05 Thread GitBox


reuvenlax commented on a change in pull request #11350:
URL: https://github.com/apache/beam/pull/11350#discussion_r420423316



##
File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
##
@@ -857,6 +863,223 @@ public BundleFinalizer bundleFinalizer() {
 }
   }
 
+  /**
+   * A concrete implementation of {@link DoFnInvoker.ArgumentProvider} used 
for running a {@link
+   * DoFn} on window expiration.
+   */
+  private class OnWindowExpirationArgumentProvider
+  extends DoFn.OnWindowExpirationContext
+  implements DoFnInvoker.ArgumentProvider {
+private final BoundedWindow window;
+private final Instant timestamp;
+private final KeyT key;
+/** Lazily initialized; should only be accessed via {@link 
#getNamespace()}. */
+private @Nullable StateNamespace namespace;
+
+/**
+ * The state namespace for this context.
+ *
+ * Any call to this method when more than one window is present will 
crash; this represents a
+ * bug in the runner or the {@link DoFnSignature}, since values must be in 
exactly one window
+ * when state or timers are relevant.
+ */
+private StateNamespace getNamespace() {
+  if (namespace == null) {
+namespace = StateNamespaces.window(windowCoder, window);
+  }
+  return namespace;
+}
+
+private OnWindowExpirationArgumentProvider(BoundedWindow window, Instant 
timestamp, KeyT key) {
+  fn.super();
+  this.window = window;
+  this.timestamp = timestamp;
+  this.key = key;
+}
+
+@Override
+public BoundedWindow window() {
+  return window;
+}
+
+@Override
+public PaneInfo paneInfo(DoFn doFn) {
+  throw new UnsupportedOperationException(
+  "Cannot access paneInfo outside of @ProcessElement methods.");
+}
+
+@Override
+public PipelineOptions pipelineOptions() {
+  return getPipelineOptions();
+}
+
+@Override
+public DoFn.StartBundleContext 
startBundleContext(DoFn doFn) {
+  throw new UnsupportedOperationException("StartBundleContext parameters 
are not supported.");
+}
+
+@Override
+public DoFn.FinishBundleContext finishBundleContext(
+DoFn doFn) {
+  throw new UnsupportedOperationException("FinishBundleContext parameters 
are not supported.");
+}
+
+@Override
+public DoFn.ProcessContext processContext(DoFn doFn) {
+  throw new UnsupportedOperationException("ProcessContext parameters are 
not supported.");
+}
+
+@Override
+public InputT element(DoFn doFn) {
+  throw new UnsupportedOperationException("Element parameters are not 
supported.");
+}
+
+@Override
+public Object sideInput(String tagId) {
+  throw new UnsupportedOperationException("SideInput parameters are not 
supported.");
+}
+
+@Override
+public Object schemaElement(int index) {
+  throw new UnsupportedOperationException("Element parameters are not 
supported.");
+}
+
+@Override
+public Instant timestamp(DoFn doFn) {
+  return timestamp;
+}
+
+@Override
+public String timerId(DoFn doFn) {
+  throw new UnsupportedOperationException("Timer parameters are not 
supported.");
+}
+
+@Override
+public TimeDomain timeDomain(DoFn doFn) {
+  throw new UnsupportedOperationException(
+  "Cannot access time domain outside of @ProcessTimer method.");
+}
+
+@Override
+public KeyT key() {
+  return key;
+}
+
+@Override
+public OutputReceiver outputReceiver(DoFn doFn) {
+  return DoFnOutputReceivers.windowedReceiver(this, mainOutputTag);
+}
+
+@Override
+public OutputReceiver outputRowReceiver(DoFn doFn) {
+  return DoFnOutputReceivers.rowReceiver(this, mainOutputTag, 
mainOutputSchemaCoder);
+}
+
+@Override
+public MultiOutputReceiver taggedOutputReceiver(DoFn 
doFn) {
+  return DoFnOutputReceivers.windowedMultiReceiver(this, outputCoders);
+}
+
+@Override
+public Object restriction() {
+  throw new UnsupportedOperationException("@Restriction parameters are not 
supported.");
+}
+
+@Override
+public DoFn.OnTimerContext onTimerContext(DoFn doFn) {
+  throw new UnsupportedOperationException("OnTimerContext parameters are 
not supported.");
+}
+
+@Override
+public RestrictionTracker restrictionTracker() {
+  throw new UnsupportedOperationException("RestrictionTracker parameters 
are not supported.");
+}
+
+@Override
+public Object watermarkEstimatorState() {
+  throw new UnsupportedOperationException(
+  "@WatermarkEstimatorState parameters are not supported.");
+}
+
+@Override
+public WatermarkEstimator watermarkEstimator() {
+  throw new UnsupportedOperationException("WatermarkEstimator parameters 
are not supported.");
+}
+
+@Override
+public State state(String stateId, boolean alwaysFetched) {
+  try {
+ 

[GitHub] [beam] thetorpedodog opened a new pull request #11615: passert.Equals: sort output strings for easier reading

2020-05-05 Thread GitBox


thetorpedodog opened a new pull request #11615:
URL: https://github.com/apache/beam/pull/11615


   R: @lostluck
   
   ---
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


pabloem commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624321585


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


pabloem commented on pull request #11554:
URL: https://github.com/apache/beam/pull/11554#issuecomment-624319547


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms

2020-05-05 Thread GitBox


robertwb commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r420415887



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {
+
+  /**
+   * Returns a new {@code SetFns.SetImpl} transform that compute the 
intersection with provided
+   * {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * both pipeline is constructed and provided {@link PCollection}.
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersect(right));
+   * }
+   */
+  public static  SetImpl intersect(PCollection rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+SerializableBiFunction intersectFn =
+(numberOfElementsinLeft, numberOfElementsinRight) -> 
(numberOfElementsinLeft > 0 && numberOfElementsinRight > 0) ? 1L : 0L;
+return new SetImpl<>(rightCollection, intersectFn);
+  }
+
+  /**
+   * Returns a new {@code SetFns.SetImpl} transform that compute the 
intersection all with
+   * provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} which will follow 
EXCEPT_ALL Semantics as
+   * follows: Given there are m elements on pipeline which is constructed 
{@link PCollection}
+   * (left) and n elements on in provided {@link PCollection} (right): - it 
will output MIN(m -
+   * n, 0) elements of left for all elements which are present in both left 
and right.
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersectAll(right));
+   * }
+   */
+  public static  SetImpl intersectAll(PCollection rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+SerializableBiFunction intersectFn =
+(numberOfElementsinLeft, numberOfElementsinRight) -> 
(numberOfElementsinLeft > 0 && numberOfElementsinRight > 0) ? 
Math.min(numberOfElementsinLeft, numberOfElementsinRight) : 0L;
+return new SetImpl<>(rightCollection, intersectFn);
+  }
+
+  /**
+   * Returns a new {@code SetFns.SetImpl} transform that compute the 
difference (except) with
+   * provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * pipeline is constructed {@link PCollection} but not present in 
provided {@link
+   * PCollection}.
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.except(right));
+   * }
+   */
+  public static  SetImpl except(PCollection rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+SerializableBiFunction exceptFn =
+(numberOfElementsinLeft, numberOfElementsinRight) -> 
numberOfElementsinLeft > 0 && numberOfElementsinRight == 0 ? 1L 

[GitHub] [beam] je-ik commented on pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


je-ik commented on pull request #11614:
URL: https://github.com/apache/beam/pull/11614#issuecomment-624317271


   @robertwb Looks like we should introduce some measures to solve this (not 
sure which measures these should be), because like we are accumulating these 
ignored tests:
   ```
   ~/git/apache/beam$ git grep "@Ignore" | wc -l
   102
   ```
   Most of associated issues remain open, which is what I would expect without 
better visibility. I'll start a discussion thread on ML.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tweise commented on a change in pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


tweise commented on a change in pull request #11558:
URL: https://github.com/apache/beam/pull/11558#discussion_r420389628



##
File path: .test-infra/jenkins/job_LoadTests_ParDo_Flink_Python.groovy
##
@@ -142,9 +249,22 @@ PhraseTriggeringPostCommitBuilder.postCommitJob(
   'Load Tests Python ParDo Flink Batch suite',
   this
 ) {
-  loadTest(delegate, CommonTestProperties.TriggeringContext.PR)
+  loadBatchTests(delegate, CommonTestProperties.TriggeringContext.PR)
+}
+
+PhraseTriggeringPostCommitBuilder.postCommitJob(
+'beam_LoadTests_Python_ParDo_Flink_Streaming',

Review comment:
   Why multiple trigger phrases?

##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkDebugPipelineOptions.java
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink;
+
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/** Debug options which shouldn't normally be used. */
+public interface FlinkDebugPipelineOptions extends PipelineOptions {
+
+  @Description(
+  "If not null, reports the checkpoint duration of each ParDo stage in the 
provided metric namespace.")
+  String getReportCheckpointDuration();
+
+  void setReportCheckpointDuration(String metricNamespace);
+
+  @Description(
+  "Shuts down sources which have been idle for the configured time of 
milliseconds. Once a source has been "
+  + "shut down, chekpointing is not possible anymore. Shutting down 
the sources eventually leads to pipeline "
+  + "shutdown once all input has been processed.")
+  @Default.Long(0)
+  Long getShutdownSourcesAfterIdleMs();

Review comment:
   Unless I misread, this parameter is directly tied to 
`!shutdownSourcesOnFinalWatermark`? How about consolidating the two? Just a 
single parameter shutdownSourcesAfterIdleMs should suffice:
   
   0 - immediate shutdown, which should be default, unless checkpointing is 
enabled
   value > 0 - wait, potentially forever
   
   There was a question on the ML recently about 
shutdownSourcesOnFinalWatermark and if that should not be default. I think it 
should be (unless checkpointing was enabled), in which case we can never 
shutdown. So there should be almost no situation where this parameter needs to 
be set, except in a special case like this. 


##
File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkDebugPipelineOptions.java
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.flink;
+
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+
+/** Debug options which shouldn't normally be used. */
+public interface FlinkDebugPipelineOptions extends PipelineOptions {
+
+  @Description(
+  "If not null, reports the checkpoint duration of each ParDo stage in the 
provided metric namespace.")
+  String getReportCheckpointDuration();
+
+  void setReportCheckpointDuration(String metricNamespace);
+
+  @Description(
+  "Shuts down sources which have been idle for the configured time of 
milliseconds. Once a source has been "
+  + "shut down, chekpointing is not possible anymore. Shutting down 
the sources eventually leads to pipeline "

Review 

[GitHub] [beam] robertwb commented on pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy

2020-05-05 Thread GitBox


robertwb commented on pull request #11613:
URL: https://github.com/apache/beam/pull/11613#issuecomment-624311669


   I thought consensus was that we did want to encourage squash for those PRs 
with a huge pile of fixup commits (and otherwise no semantically meaningful 
commits). 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


robertwb commented on pull request #11614:
URL: https://github.com/apache/beam/pull/11614#issuecomment-624310548


   I'm not sure we have more visibility into disabled tests than the jira entry 
(which shouldn't get closed until the tests are fixed and/or deemed unneeded 
and deleted. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11296: [BEAM-9640] Sketching watermark tracking on FnApiRunner

2020-05-05 Thread GitBox


robertwb commented on a change in pull request #11296:
URL: https://github.com/apache/beam/pull/11296#discussion_r420388089



##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
##
@@ -363,16 +387,17 @@ def _run_bundle_multiple_times_for_testing(
   finally:
 runner_execution_context.state_servicer.restore()
 
+  @staticmethod
   def _collect_written_timers_and_add_to_fired_timers(
-  self,
   bundle_context_manager,  # type: execution.BundleContextManager
   fired_timers  # type: Dict[Tuple[str, str], ListBuffer]

Review comment:
   Are these fired_timers, or timers_to_fire? 

##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py
##
@@ -296,6 +297,12 @@ def __init__(self,
 self.safe_coders = safe_coders
 self.data_channel_coders = data_channel_coders
 
+self.transform_id_to_buffer_id = {

Review comment:
   Different transforms may have different input/output buffers associated 
with them. Perhaps name this `input_transform_to_buffer_id` or 
`buffer_id_by_consumer` or similar. 

##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
##
@@ -384,13 +409,25 @@ def _collect_written_timers_and_add_to_fired_timers(
 out = create_OutputStream()
 for decoded_timer in timers_by_key_and_window.values():
   timer_coder_impl.encode_to_stream(decoded_timer, out, True)
+  if (transform_id, timer_family_id) not in timer_watermark_data:
+timer_watermark_data[(transform_id,
+  timer_family_id)] = timestamp.MAX_TIMESTAMP
+  timer_watermark_data[(transform_id, timer_family_id)] = min(
+  timer_watermark_data[(transform_id, timer_family_id)],
+  decoded_timer.fire_timestamp)
 fired_timers[(transform_id, timer_family_id)] = ListBuffer(
 coder_impl=timer_coder_impl)
 fired_timers[(transform_id, timer_family_id)].append(out.get())
 written_timers.clear()
 
+return timer_watermark_data
+
   def _add_sdk_delayed_applications_to_deferred_inputs(
   self, bundle_context_manager, bundle_result, deferred_inputs):
+# type: (...) -> Set[str]
+
+"""Returns a set of PCollections with delayed applications."""

Review comment:
   Set of PCollection ids? Buffer ids? 

##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/watermark_manager.py
##
@@ -0,0 +1,206 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities for managing watermarks for a pipeline execution by 
FnApiRunner."""
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.runners.portability.fn_api_runner import translations
+from apache_beam.runners.portability.fn_api_runner.translations import 
split_buffer_id
+from apache_beam.runners.worker import bundle_processor
+from apache_beam.utils import proto_utils
+from apache_beam.utils import timestamp
+
+
+class WatermarkManager(object):
+  """Manages the watermarks of a pipeline's stages.
+It works by constructing an internal graph representation of the pipeline,
+and keeping track of dependencies."""
+  class WatermarkNode(object):
+def __init__(self, name):
+  self.name = name
+
+  class PCollectionNode(WatermarkNode):
+def __init__(self, name):
+  super(WatermarkManager.PCollectionNode, self).__init__(name)
+  self._watermark = timestamp.MIN_TIMESTAMP
+  self.producers = set()
+
+def __str__(self):
+  return 'PCollectionNodehttp://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Utilities for managing watermarks for a pipeline execution by 
FnApiRunner."""
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api import beam_runner_api_pb2
+from 

[GitHub] [beam] udim commented on pull request #11038: [BEAM-7746] More typing fixes

2020-05-05 Thread GitBox


udim commented on pull request #11038:
URL: https://github.com/apache/beam/pull/11038#issuecomment-624296105


   > Hi everyone, I have some availability to finish this PR off now. I'm going 
to rebase it soon. @udim do you have the time to help me get this through 
review?
   
   Yeah, I'll make another pass later today



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on a change in pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


je-ik commented on a change in pull request #11614:
URL: https://github.com/apache/beam/pull/11614#discussion_r420390006



##
File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java
##
@@ -242,6 +243,7 @@ public void close() {}
  * This test verifies that watermarks are correctly forwarded.
  */
 @Test(timeout = 30_000)
+@Ignore("https://issues.apache.org/jira/browse/BEAM-9164;)

Review comment:
   This is not `@Category(ValidatesRunner)` test, looks like it is 
Flink-specific already?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on a change in pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


iemejia commented on a change in pull request #11614:
URL: https://github.com/apache/beam/pull/11614#discussion_r420387278



##
File path: 
runners/flink/src/test/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapperTest.java
##
@@ -242,6 +243,7 @@ public void close() {}
  * This test verifies that watermarks are correctly forwarded.
  */
 @Test(timeout = 30_000)
+@Ignore("https://issues.apache.org/jira/browse/BEAM-9164;)

Review comment:
   This one is apparently only flaky on Flink can we better exclude it 
manually only for Flink?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib commented on pull request #11403: [DO NOT MERGE] Run all PostCommit and PreCommit Tests against Release Branch

2020-05-05 Thread GitBox


ibzib commented on pull request #11403:
URL: https://github.com/apache/beam/pull/11403#issuecomment-624286892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


je-ik commented on pull request #11614:
URL: https://github.com/apache/beam/pull/11614#issuecomment-624285935


   @robertwb Thanks for approval, do we have a way of visualizing ignored 
tests? I'm a little afraid these test might get ignored for ever, which might 
be unfortunate.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik opened a new pull request #11614: Disable two flaky tests (BEAM-8035, BEAM-9164)

2020-05-05 Thread GitBox


je-ik opened a new pull request #11614:
URL: https://github.com/apache/beam/pull/11614


   Disables two flaky tests until resolution is found.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[GitHub] [beam] Ardagan commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


Ardagan commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624265259


   Hey Kamil,
   can we also add a proper landing page for metrics site? People regularly 
can't navigate to dashboards they need. Adding landing page with intuitive 
navigation would help a lot. That should be a separate PR though. 
[BEAM-6710](https://issues.apache.org/jira/browse/BEAM-6710)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624264625


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm removed a comment on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm removed a comment on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624264392


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624264392


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm removed a comment on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm removed a comment on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624229343


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service

2020-05-05 Thread GitBox


ihji commented on pull request #11574:
URL: https://github.com/apache/beam/pull/11574#issuecomment-624260369


   > > I think we should also consider adding optional pipeline_options 
argument to ExternalTransform given that each different expansion service needs 
different pipeline options.
   > 
   > I'm not sure I understand that. Shouldn't pipeline options be scoped 
per-pipeline?
   
   Make sense. I find myself that I have a mental model that external 
transforms are basically imported from different pipelines. If we assume that 
they are heterogeneous but still parts of a same pipeline, then it's okay to 
share a single pipeline option.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


aaltay commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624248353


   /cc @chamikaramj @tysonjh @kennknowles -- optional review request, if you 
would like to take a quick look at new benchmarks at 
http://metrics.beam.apache.org.
   
   (Instructions from @Ardagan : To find dashboards: click at top-left on 
"Home" or "current dashboard name", this will open drop-down list with full set 
of dashboards.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


aaltay commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624247136


   Some comments:
   - I do see missing data. (Example: 
http://metrics.beam.apache.org/d/fK0U4JqWz/cogbk-load-tests?orgId=1 -- all 
graphs missing recent data, java | coGBK | 100B records with a single key  
missing spark data for longer.)
   - go benchmarks are completely empty.
   - Some different colors (Example: 
http://metrics.beam.apache.org/d/bnlHKP3Wz/java-io-it-tests-dataflow?orgId=1 -- 
TextIOIT | 1 GB | GCS | "Many files" | GCS Copies is in blue color)
   - Since all dashboards have python/java selectors, why Python IO IT Tests 
and Java IO IT Tests are different dashboards?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay edited a comment on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


aaltay edited a comment on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624247136


   Some comments:
   - I do see missing data. (Example: 
http://metrics.beam.apache.org/d/fK0U4JqWz/cogbk-load-tests?orgId=1 -- all 
graphs missing recent data, java | coGBK | 100B records with a single key  
missing spark data for longer.)
   - go benchmarks are completely empty.
   - Some different colors (Example: 
http://metrics.beam.apache.org/d/bnlHKP3Wz/java-io-it-tests-dataflow?orgId=1 -- 
TextIOIT | 1 GB | GCS | "Many files" | GCS Copies is in blue color)
   - Since all dashboards have python/java selectors, why Python IO IT Tests 
and Java IO IT Tests are different dashboards?
   
   I might be missing other issues as well. If they are easy to fix later, we 
can fix what is identified, merge and ask for feedback on dev@ list.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on a change in pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


TheNeuralBit commented on a change in pull request #11609:
URL: https://github.com/apache/beam/pull/11609#discussion_r420336807



##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java
##
@@ -97,4 +99,19 @@ public void testNanosDuration() {
 assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class));
 assertEquals(durationAsRow, row.getBaseValue(0, Row.class));
   }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void testFixedBytesIllegalArgument() {
+Schema schema = Schema.builder().addLogicalTypeField("char", 
FixedBytes.of(10)).build();
+byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5};
+Row row = Row.withSchema(schema).withFieldValue("char", 
byteArrayWithLengthFive).build();
+  }

Review comment:
   Could you add a test like this but with `addValues`?

##
File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/schemas/logicaltypes/LogicalTypesTest.java
##
@@ -97,4 +99,19 @@ public void testNanosDuration() {
 assertEquals(duration, row.getLogicalTypeValue(0, NanosDuration.class));
 assertEquals(durationAsRow, row.getBaseValue(0, Row.class));
   }
+
+  @Test(expected = IllegalArgumentException.class)
+  public void testFixedBytesIllegalArgument() {
+Schema schema = Schema.builder().addLogicalTypeField("char", 
FixedBytes.of(10)).build();
+byte[] byteArrayWithLengthFive = {1, 2, 3, 4, 5};
+Row row = Row.withSchema(schema).withFieldValue("char", 
byteArrayWithLengthFive).build();
+  }
+
+  @Test
+  public void testFixedBytes() {
+Schema schema = Schema.builder().addLogicalTypeField("char", 
FixedBytes.of(10)).build();
+byte[] byteArray = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
+Row row = Row.withSchema(schema).withFieldValue("char", byteArray).build();
+assertTrue(Arrays.equals(byteArray, row.getLogicalTypeValue("char", 
byte[].class)));
+  }

Review comment:
   Since these tests are really checking `Row`'s verification, I think they 
would be better in `RowTest`. Could you move them there? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


TheNeuralBit commented on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624243528


   > Will this line ever get hit?
   
   We always convert logical types to their base type when serializing with 
SchemaCoder, and convert back to the input type when deserializing. Other than 
that I think the only time it should get called is when constructing a Row 
instance (unless you use attachValues).
   
   > Can we consider that the input value provided is of BaseType, which we can 
convert to InputType and store in memory?
   
   Would this just be so that we're guaranteed to call `toInputType` whenever 
setting a value on Row? This PR accomplishes the same thing right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


aaltay commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624242630


   > Done. I pushed modified version to the website 
(http://metrics.beam.apache.org) 
   
   I do not see the new dashboard here. How can I find it?
   
   I see these three:
   Code Velocity
   Post-commit Test Reliability
   Stability critical jobs status



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624229343


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624227730


   Run Seed Job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput

2020-05-05 Thread GitBox


je-ik commented on pull request #11612:
URL: https://github.com/apache/beam/pull/11612#issuecomment-624225596


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia opened a new pull request #11613: [BEAM-9833] Fix yaml issues, sort labels and merge button as only strategy

2020-05-05 Thread GitBox


iemejia opened a new pull request #11613:
URL: https://github.com/apache/beam/pull/11613


   Issues reported by yamllint and some minor fixes. Also set merge button as 
the only strategy because we don't want to encourage (o even make possible the 
other two).
   
   R: @pabloem
   CC: @ibzib 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


Ardagan commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624216529


   Some dashboards seem to miss data, but that's due to not all data migrated 
IIUC.
   LGTM otherwise.
   @aaltay can you take a look as well please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan edited a comment on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


Ardagan edited a comment on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624213555


   > @aaltay
   > 
   > > It would be great if we can make data-points clickable with links to 
relevant job
   > 
   > Grafana has a feature called Data links [1] that could be use here. But 
the biggest challenge is to get Jenkins job id for specific data point. When 
Python or Java test sends their metrics to InfluxDB/BigQuery, they have no 
knowledge of Jenkins job that executes them.
   > 
   > Without a rework of sending metrics, this functionality will be difficult 
to implement.
   > 
   > @Ardagan Any thoughs?
   > 
   > [1] https://grafana.com/docs/grafana/latest/reference/datalinks/
   
   I believe we can get jenkins job ID via 
[env.JOB_NAME](https://stackoverflow.com/questions/8309383/how-to-get-the-jobname-from-jenkins),
 but this will required update test metric report logic and DB update IIUC. We 
can add jira to do this improvement in separate PR.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


Ardagan commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624213555


   > @aaltay
   > 
   > > It would be great if we can make data-points clickable with links to 
relevant job
   > 
   > Grafana has a feature called Data links [1] that could be use here. But 
the biggest challenge is to get Jenkins job id for specific data point. When 
Python or Java test sends their metrics to InfluxDB/BigQuery, they have no 
knowledge of Jenkins job that executes them.
   > 
   > Without a rework of sending metrics, this functionality will be difficult 
to implement.
   > 
   > @Ardagan Any thoughs?
   > 
   > [1] https://grafana.com/docs/grafana/latest/reference/datalinks/
   
   I believe we can get jenkins job ID via 
[env.JOB_NAME](https://stackoverflow.com/questions/8309383/how-to-get-the-jobname-from-jenkins).
 We can add jira to do this improvement in separate PR.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay edited a comment on pull request #11210: [BEAM-8949] SpannerIO integration tests

2020-05-05 Thread GitBox


aaltay edited a comment on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-624213199


   > @aaltay - Wordcount stream tests are failing. They are running 
successfully on my local machine. Jira ticket is already created for that - 
https://issues.apache.org/jira/browse/BEAM-9767
   > Maybe retriggering the pre-commit test works but can't say that would pass 
100% bcoz of the flaky wordcount test.
   > 
   > Apart from that, there were 3 pre-commit jobs are triggered almost the 
same time on Jenkins (with ~30seconds difference) and one of them is 
successfully completed but sadly the failed one is linked with the github.
   > 
   > successful pre-commit: 
https://builds.apache.org/job/beam_PreCommit_Python_Commit/12541/
   > 
   > failed jobs:
   > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12542/
   > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12543/
   
   OK, let's re-run to get a clear signal.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11210: [BEAM-8949] SpannerIO integration tests

2020-05-05 Thread GitBox


aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-624213199


   > @aaltay - Wordcount stream tests are failing. They are running 
successfully on my local machine. Jira ticket is already created for that - 
https://issues.apache.org/jira/browse/BEAM-9767
   > Maybe retriggering the pre-commit test works but can't say that would pass 
100% bcoz of the flaky wordcount test.
   > 
   > Apart from that, there were 3 pre-commit jobs are triggered almost the 
same time on Jenkins (with ~30seconds difference) and one of them is 
successfully completed but sadly the failed one is linked with the github.
   > 
   > successful pre-commit: 
https://builds.apache.org/job/beam_PreCommit_Python_Commit/12541/
   > 
   > failed jobs:
   > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12542/
   > https://builds.apache.org/job/beam_PreCommit_Python_Commit/12543/
   
   OK, let's re-run to de-flake.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11210: [BEAM-8949] SpannerIO integration tests

2020-05-05 Thread GitBox


aaltay commented on pull request #11210:
URL: https://github.com/apache/beam/pull/11210#issuecomment-624213031


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform

2020-05-05 Thread GitBox


Ardagan commented on pull request #11582:
URL: https://github.com/apache/beam/pull/11582#issuecomment-624210475


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-05 Thread GitBox


pabloem commented on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-624209882







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on a change in pull request #11582: [BEAM-9650] Add ReadAllFromBigQuery PTransform

2020-05-05 Thread GitBox


Ardagan commented on a change in pull request #11582:
URL: https://github.com/apache/beam/pull/11582#discussion_r420286996



##
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##
@@ -283,6 +284,8 @@ def compute_table_name(row):
 'BigQuerySink',
 'WriteToBigQuery',
 'ReadFromBigQuery',
+'ReadAllFromBigQueryRequest',

Review comment:
   I'm working on adding similar ReadAll to Java API.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms

2020-05-05 Thread GitBox


amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-624199945


   @darshanj 
   
   You can run `./gradlew spotlessApply` to fix checkstyle issues.
   You need to run `./gradlew  ${module}:check` command to not only run tests 
but also run style check.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms

2020-05-05 Thread GitBox


amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-624199252


   cc @Mark-Zeng to make sure I tagged the right person.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] Ardagan commented on pull request #11477: [BEAM-9650] Add PeriodicSequence generator.

2020-05-05 Thread GitBox


Ardagan commented on pull request #11477:
URL: https://github.com/apache/beam/pull/11477#issuecomment-624198780


   Run Website_Stage_GCS PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia edited a comment on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms

2020-05-05 Thread GitBox


amaliujia edited a comment on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-624196073


   R: @amaliujia 
   cc: @jhnmora000 @Mark-Zeng (expose more PRs to GSoC students)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput

2020-05-05 Thread GitBox


je-ik commented on pull request #11612:
URL: https://github.com/apache/beam/pull/11612#issuecomment-624197391


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] amaliujia commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms

2020-05-05 Thread GitBox


amaliujia commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-624196073


   R: @amaliujia 
   cc: @jhnmora000 @MarkZeng1998 (expose more PRs to GSoC students)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #11557: [BEAM-9845] Stage artifacts over expansion service.

2020-05-05 Thread GitBox


robertwb commented on pull request #11557:
URL: https://github.com/apache/beam/pull/11557#issuecomment-624188539


   Thanks. Those suites passed locally, I'll look into what's going on here. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624184021


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mxm commented on pull request #11558: [BEAM-8742] Add stateful and timely processing benchmarks

2020-05-05 Thread GitBox


mxm commented on pull request #11558:
URL: https://github.com/apache/beam/pull/11558#issuecomment-624180353


   Run Seed Job



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu removed a comment on pull request #11274: [BEAM-9633] Add PubsubIO performance test

2020-05-05 Thread GitBox


kamilwu removed a comment on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-624175575


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #11274: [BEAM-9633] Add PubsubIO performance test

2020-05-05 Thread GitBox


kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-624176044


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #11274: [BEAM-9633] Add PubsubIO performance test

2020-05-05 Thread GitBox


kamilwu commented on pull request #11274:
URL: https://github.com/apache/beam/pull/11274#issuecomment-624175575


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kamilwu commented on pull request #11555: [BEAM-8134] Grafana dashboards for Load Tests and IO IT Performance Tests

2020-05-05 Thread GitBox


kamilwu commented on pull request #11555:
URL: https://github.com/apache/beam/pull/11555#issuecomment-624173530


   @aaltay 
   > It would be great if we can make data-points clickable with links to 
relevant job
   
   Grafana has a feature called Data links [1] that could be use here. But the 
biggest challenge is to get Jenkins job id for specific data point. When Python 
or Java test sends their metrics to InfluxDB/BigQuery, they have no knowledge 
of Jenkins job that executes them.
   
   Without a rework of sending metrics, this functionality will be difficult to 
implement.
   
   @Ardagan Any thoughs? 
   
   [1] https://grafana.com/docs/grafana/latest/reference/datalinks/



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on pull request #11038: [BEAM-7746] More typing fixes

2020-05-05 Thread GitBox


robertwb commented on pull request #11038:
URL: https://github.com/apache/beam/pull/11038#issuecomment-624170238


   Yes, let's make it happen! I'll help out as well.
   
   On Mon, May 4, 2020 at 11:47 AM Chad Dombrova  
wrote:
   >
   > I've rebased onto master. We jumped from 32 errors to 260+. We're going to 
need to make a concerted effort to get these typing MRs through, and beat the 
merge conflict fatigue. Can we make it happen?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11574: [BEAM-9449] Pass PipelineOptions through expansion service

2020-05-05 Thread GitBox


TheNeuralBit commented on pull request #11574:
URL: https://github.com/apache/beam/pull/11574#issuecomment-624169642


   > I think we should also consider adding optional pipeline_options argument 
to ExternalTransform given that each different expansion service needs 
different pipeline options.
   
   I'm not sure I understand that. Shouldn't pipeline options be scoped 
per-pipeline?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput

2020-05-05 Thread GitBox


je-ik commented on pull request #11612:
URL: https://github.com/apache/beam/pull/11612#issuecomment-624162961


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] reuvenlax edited a comment on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


reuvenlax edited a comment on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624156907


   @TheNeuralBit withFieldValue should replace addValues for most users. 
addValues is difficult and error prone and withFieldValues allows building a 
row based on named fields instead of positional fields.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11607: [BEAM-9430] Makes sure the watermarks returned by estimators are within bounds

2020-05-05 Thread GitBox


chamikaramj commented on pull request #11607:
URL: https://github.com/apache/beam/pull/11607#issuecomment-624161195


   Closing this temporarily while I look into this further.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


rahul8383 commented on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624160514


   @reuvenlax @TheNeuralBit 
   Thanks for the review.
   Could you please clarify the questions that I have asked above?
   I wanted to understand more about logical types for my PR #11581 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] reuvenlax commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


reuvenlax commented on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624156907







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 edited a comment on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


rahul8383 edited a comment on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624153984


   @TheNeuralBit
   `schemas.logicaltypes.LogicalTypesTest.testFixedBytesIllegalArgument` Test 
will fail even if `addValue()` method is used instead of `withFieldValue()` 
method.  
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rahul8383 commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


rahul8383 commented on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624153984


   @TheNeuralBit
   schemas.logicaltypes.LogicalTypesTest.testFixedBytesIllegalArgument Test 
will fail even if addValue() method is used instead of withFieldValue() method. 
 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11609: [BEAM-9887] Throw IllegalArgumentException when building Row with logical types with Invalid input

2020-05-05 Thread GitBox


TheNeuralBit commented on pull request #11609:
URL: https://github.com/apache/beam/pull/11609#issuecomment-624149550


   Hm so there are several ways of manually building a Row instance that 
provide different levels of runtime type-checking. `Row#addValues` explicitly 
validates everything, and `Row#attachValues` explicitly does not, for 
performance reasons. In SQL we have an option to switch between the two: 
   
   
https://github.com/apache/beam/blob/34c58c42f14d2534a2f72f9194ddf9cc69138eea/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L354-L357
   
   So we can have runtime type-checking for debugging, but then turn it off for 
performance.
   
   I'm not sure how `withFieldValue` is intended to work. I'm not sure if the 
missing `toInputType(toBaseType(value))` for that code path is intentional or 
an oversight. Can you clarify @reuvenlax?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11557: [BEAM-9845] Stage artifacts over expansion service.

2020-05-05 Thread GitBox


chamikaramj commented on pull request #11557:
URL: https://github.com/apache/beam/pull/11557#issuecomment-624130608







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] bntnam commented on a change in pull request #11554: [BEAM-9876] Migrate the Beam website from Jekyll to Hugo to enable localization of the site content

2020-05-05 Thread GitBox


bntnam commented on a change in pull request #11554:
URL: https://github.com/apache/beam/pull/11554#discussion_r420199948



##
File path: website/www/site/content/en/community/contact-us.md
##
@@ -0,0 +1,47 @@
+---
+title: "Contact Us"
+aliases:
+  - /community/
+  - /use/issue-tracking/
+  - /use/mailing-lists/
+  - /get-started/support/
+---
+
+
+# Contact Us
+
+There are many ways to reach the Beam user and developer communities - use
+whichever one seems best.
+
+

Review comment:
   If you look at the table shortcode as its mentioned in the comment (its 
located at website/www/site/layouts/shortcodes/table), you will see we can pass 
the markdown into a `div` because of `{{ .Inner | markdownify }}`. And we use 
this table shortcode to replace `{:.table}` syntax in Jekyll. We've done it 
with several tables. However, only this one, it uses `[^1]` to refer a link and 
unfortunately `{{ .Inner | markdownify }}` doesn't support this passing, so we 
need to use the `div` directly on this without passing through `markdownify`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] je-ik commented on pull request #11612: [BEAM-9888] Drop data based on input watermark in @RequiresTimeSortedInput

2020-05-05 Thread GitBox


je-ik commented on pull request #11612:
URL: https://github.com/apache/beam/pull/11612#issuecomment-624121709


   Run Direct ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >